• Open

    [D] LLMs playing chess are sensitive to how the position came to be
    Link - https://github.com/dpaleka/llm-chess-proofgame TLDR; The lead up to the state of the board and not just the state of the board at inference affects predictions. submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [D] A script to pre-process arxiv sources?
    People train LLMs on arxiv sources, so there must be some sort of software to whip them into shape. Specifically, I'm looking for a script to join all the tex files for a paper into one. Note that it's not just a matter of substituting \input's - sometimes it's not clear which file is the main one, so it needs to handle this too. submitted by /u/Foxtr0t [link] [comments]  ( 9 min )
    [R] Human-like systematic generalization through a meta-learning neural network
    Work. I am not affiliated with this work or its authors. Article about the work. Twitter thread about the work from one of its authors. Abstract: The power of human language and thought arises from systematic compositionality—the algebraic ability to understand and produce novel combinations from known components. Fodor and Pylyshyn famously argued that artificial neural networks lack this capacity and are therefore not viable models of the mind. Neural networks have advanced considerably in the years since, yet the systematicity challenge persists. Here we successfully address Fodor and Pylyshyn’s challenge by providing evidence that neural networks can achieve human-like systematicity when optimized for their compositional skills. To do so, we introduce the meta-learning for compositionality (MLC) approach for guiding training through a dynamic stream of compositional tasks. To compare humans and machines, we conducted human behavioural experiments using an instruction learning paradigm. After considering seven different models, we found that, in contrast to perfectly systematic but rigid probabilistic symbolic models, and perfectly flexible but unsystematic neural networks, only MLC achieves both the systematicity and flexibility needed for human-like generalization. MLC also advances the compositional skills of machine learning systems in several systematic generalization benchmarks. Our results show how a standard neural network architecture, optimized for its compositional skills, can mimic human systematic generalization in a head-to-head comparison. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [R] Researchers discover that in-context learning creates task vectors in LLMs
    A new paper provides some insight into how in-context learning works in LLMs. This study proposes and provides evidence for an elegant structure within the in-context learning process. The models appear to create a "task vector" that encapsulates the core logic from the demonstration examples, in a way that is independent of any specific query. This vector serves as a compressed representation of the task. A separate component then takes this task vector and a new query as inputs to generate the output, without directly referencing the original examples. In essence: Output = Apply(query, Learn(examples)) Where "Learn" derives the task vector from the examples, and "Apply" utilizes the vector and query to produce the output. The researchers validated this hypothesis by testing major public models on diverse tasks such as translation and algorithmic reasoning. Key findings: Isolating the Learn and Apply components maintained high accuracy, demonstrating the viability of the separation. Task vectors clustered by task and remained consistent within tasks, indicating they encode meaningful task representations. Injecting another task's vector into the model caused it to override contradictory examples and follow the vector, highlighting the vector's dominance. Vectors induced relevant token distributions despite those terms being absent from the examples, suggesting semantic encoding of the task. Taken together, these results provide substantial evidence that in-context learning involves creating a task vector that encapsulates the examples' logic to then guide behavior on new queries. While open questions remain regarding implementation details, this is a significant step towards demystifying an interesting AI capability. Full writeup. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] What is more Important loss or accuracy?
    I have created a basic classification model and there is something that I don't fully comprehend, as the loss decreases the accuracy increases (I assume this is how it should be in ideal scenarios) while this is the general trend there is a point where the loss is minimum and while accuracy at that point is high it's not the highest. Why would such a phenomenon occur? And since it occurred what is a better metric for the evaluation of the model? submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] Locally hosted audio-to-text transcription - model and hardware?
    Hi, I'm looking for a locally hosted LLM to transcribe audio files to text. I need this for my business, but with absolute privacy (witness testimony recordings and other highly sensitive data). I figured I'd just buy a new computer which never gets connected to the internet and is dedicated only to audio processing to have absolute security. My questions are: - Which is the best model to use? I prefer accuracy and don't mind processing time as long as it's getting done within a few hours to even a day or two (I need to transcribe maximum 1 file per day, but up to six hours of audio), so I figured the large Whisper - probably WhisperX ? - would be my best bet. Are there comparable non-openAI models? (I need diarization) - What hardware should I get for this? Cost is secondary/irrelevant, although I don't want to spend 5 figures on a GPU - I can accept some processing time submitted by /u/Jealous_Pomelo_1172 [link] [comments]  ( 9 min )
    Best NLP Package in Python to extract medical test results from medical notes? [D]
    I am trying to extract FEV1 (forced expiratory volume) values from a dataset that contains a column with report notes from the doctor assessing the patient with pulmonary function testing. I have been able to build out a sort of solution with regexes in Python, and that's somewhat effective. But I've been instructed to code up an alternative using a more machine learning-based approach. I wanted to use spaCy to accomplish this but I'm not sure exactly how to implement the code nor if spaCy is the best package to use for this task. Here is my regex code that works decently. It's pretty messy and have to take into account a ton of edge cases which can get cumbersome. This is why I'd like to find a more automated solution. #Attempting to add in percents df = pd.read_excel('[mypath]/pft_tiu…  ( 11 min )
    [P] Adala – an open source Autonomous DAta (Labeling) Agent framework that helps you automate data processing and data labeling
    Hi r/MachineLearning, We have just open sourced Adala - a robust framework for implementing agents that specialize in advanced data processing tasks, starting with data labeling and generation. Agents combine knowledge outputs from LLMs and action on them in production systems, thus their reliability to correctly and consistently perform operations is critical. We saw an opportunity to create a new agent framework that could dramatically increase the efficiency of data labeling (and broader application across data processing tasks), with the unique ability to be guided by human feeback. To ensure agents remember and build upon their experiences, Adala provides a Memory component—a dynamic storage space for the agent's acquired knowledge. For instance, retrieving the previous experiences of an agent’s errors (and subsequent human feedback) allows them a starting point from which to branch off into learning or improving skills. To allow Adala to produce reliable agents, we devised two main strategies: Supervision Integration: Provide agents with 'ground truth data'—well-defined examples that serve as a learning foundation. This foundational data not only sets the learning parameters for the agent but also defines its operational environment. Constrained Generation: Ensuring that an agent's predictions are within a defined and bounded range of outputs. Let us know what you think in the comments below or by contributing to the repo. Adala framework overview ​ submitted by /u/pirate7777777 [link] [comments]  ( 9 min )
    [P] Pre-training dataset
    I'm trying to pre-train my own language model on some high quality datasets (TinyStories,tiny-textbooks...). Some of these datasets include input-output data and some are just text (stories), I was wondering how should I format the data for pre-training. Should I only use plain text like stories and webtext in pretraining then the rest in fine-tuning (adding instruction tokens) or should I just train with all of the datasets at pre-training with the special tokens where they are needed? submitted by /u/Additional-Ad-7043 [link] [comments]  ( 9 min )
    [Research] large language/speech models and voice interface research
    Hey ML folks, My friend is working on his academic research project where he is exploring voice research spealizing in large speech models. If you have time, help him advance his research on voice interfaces. should take 2 mins max. https://forms.gle/a3PaQmYEiqRDxY4Z8 whats in it for me ? you can share email to get a copy of the research and listen what the rest of us have said. Thanks! submitted by /u/deep-thoughts-guy [link] [comments]  ( 9 min )
    [R] Open Source video enhancement options
    We work the disease prediction based on video classification and would like to test what improving the quality of videos would do for our models, any specific components, apps or packages we should test? So far used UpScayl, not sure how that ranks submitted by /u/sladebrigade [link] [comments]  ( 9 min )
    [P] OSS tool to interactively explore Hugging Face datasets with one line of code
    submitted by /u/44sps [link] [comments]  ( 8 min )
    [P] Training a transformer from scratch
    Hello! I would like to train a transformer network from scratch, without pre-training, on a language modeling task (next work prediction) or a sequence-to-sequence task (translation). For the language modeling task, I tried with the Shakespeare dataset, and other simpler ones (e.g., Beatles songs), but it tends to overfit quite quickly on the training set, probably because the corpus is too short. I know that Andrej Karpathy did it with the Shakespeare dataset in his YouTube video, but he used a character-wise tokenisation, which dramatically reduces the validation loss on the next-work prediction task, given that the vocab size is tiny. I guess that at the end the generation process provides a similar quality of text as when a word tokenizer is used. Surprisingly, I had quite good results by training from scratch an Encoder-Decoder model, for English-to-French translation (using the 8 million examples of the Tatoeba dataset). I guess here, the overfitting is less prominent because there are more datapoints, and that the possibility of predictions are much more constrained, due to the input sequence. What are you guys experience with this? I would be happy to know how I can train my transformer without having to use a pre-trained architectures or spend weeks on GB datasets. Thank you! submitted by /u/rem_dreamer [link] [comments]  ( 9 min )
    Data analysis vs ML engineering [D]
    Do you think coursera certifications, besides a master in electrical engineering, can help us find better occupational positions? I am told that for a beginner it is better to start with jobs in Data analysis rather than going directly to ML engineering. Is it corr? Is data analysis a prerequisite for ML? submitted by /u/Street-Regular-9924 [link] [comments]  ( 9 min )
    [D][R] How should the architecture of a transformer be scaled?
    When increasing the parameters of a (decoder-only) transformer, one has a choice around how to spend that increased budget -- number of layers vs embedding dim vs number of heads. Anyone know if there's solid guidance out there for the proportions each aspect should be scaled in? E.g. looking at LLaMa (https://arxiv.org/abs/2302.13971), they seem to scale the first sizes two proportionally, but for larger sizes, n heads grows more slowly. https://preview.redd.it/ytdfk1d5rbwb1.jpg?width=1422&format=pjpg&auto=webp&s=abf22ac369ec5ecf81ff07b0d8a095f884efe729 submitted by /u/Tea_Pearce [link] [comments]  ( 9 min )
    [D] What are some existing datasets for training LLMs to perform reasoning, acting as agents?
    There are a lot of great open datasets for fine-tuning LLMs for instruction following (e.g LIMA, self-instruct, dolly-15k, etc) and as chat bots (OASST, etc). One thing I have not really seen yet are datasets that involves planning and tool use. Is anybody working on something like that or have come across any? I'm interested in working on one. If anybody has ever attempted this, I would really appreciate any advice. P.S I do note that "reasoning" should be more rigorously defined and scoped, but I think some ambiguity around an intellectual discussion like this can help. submitted by /u/notllmchatbot [link] [comments]  ( 9 min )
    [D] Open-source SOTA Audio-to-Audio: how do I sound like a famous actor?
    Hello people, I would like to learn how to turn the recording of my voice to sound like a famous person. I imagine I would take an open source model and fine-tune it using data I will collect. Can someone point me towards the best sounding current models that I could adapt for this purpose? Thank you so much. submitted by /u/gonzales82 [link] [comments]  ( 9 min )
    [D] Guidance needed for upcoming AI/ML PhDs on selecting research topics with lasting impact
    Many upcoming Ph.D. students in AI/ML are facing the difficult decision of identifying promising research topics that will stand the test of time over the time of their Ph.D. studies. With the rapid progress in AI, especially in the NLP field, many incremental research tasks have been effectively "solved". Need to choose an area where there is ample room for open-ended inquiry and meaningful contributions over 4-5 years of PhD research. While large language models have shown impressive advances recently, their capabilities may plateau during a Ph.D. (if starting the Ph.D. from next year ~ 4 years) timeframe. How should aspiring researchers choose topics resilient enough to withstand the test of time and allow them to push the field forward through their Ph.D. work? For those with experience in AI research who have seen changes in the field over time: What emerging trends or broad areas do you see as fertile ground for AI/ML PhD research now and in the coming years? Can you highlight any intriguing subfields worthy of deeper investigation by aspiring PhD students? What open problems or applications warrant more attention from the upcoming generation of PhD researchers? Some of tending Research topics so far: LLM in a specific domain Prompting Evals LM interfaces Safety Understanding LMs Emergence Any advice on identifying PhD research topics with longevity would be greatly appreciated by aspiring graduate students. submitted by /u/aadityaura [link] [comments]  ( 9 min )
    [P][R] Test-Val scores, how much difference isn't problematic.
    Hello folks, I'm working on a medical image dataset using EM loss and asymmetric pseudo labelling for single positive multi-label learning (only training using 1 positive label). I'm using a densenet121 and on a chest x-ray dataset. I see a difference of 10% in my validation vs test score (score = mAP: mean average precision). The score seems okay and was expected but the difference is bothering me. I understand that it's obvious but any visual insights from your side? (Attaching plot below) The validation set consist less than half of test set samples. (It is the official split; I have nothing to do with it). I feel it is the reason, as ofcourse more the randomness in a set, poorer the convergence. ​ https://preview.redd.it/nseqy1mw5bwb1.png?width=577&format=png&auto=webp&s=fbd63e8a5f4920a8109b6a75aeb039a3965bba58 Do share any experiences or suggestions! submitted by /u/ade17_in [link] [comments]  ( 9 min )
    [D] Are there method that can extract interaction between person in text?
    I want to extract interaction between persons in short text. For example, "Sally will buy a new phone. Ted will help her." contains interaction between persons. However, "Japanese Karate champion won the first prize." and "Sally missed her friends, Ted and Tom" does not contain interaction between persons. Is there any tools or methods that can extract interactions? submitted by /u/tkddnjs1234 [link] [comments]  ( 9 min )
    [D] Who are some outspoken AI people who speak against AI ethics and regulation?
    I'm interested in learning more about the perspectives of AI researchers and practitioners who are critical of AI ethics and regulation. I'm particularly interested in those who argue that AI ethics and regulation are unnecessary or harmful. Please note that I'm not asking for people who are simply skeptical of certain AI ethics proposals or who believe that AI ethics should be implemented in a specific way. I'm more interested in people who argue that AI ethics is a fundamentally flawed concept or that AI should not be regulated at all. submitted by /u/Periplokos [link] [comments]  ( 9 min )
    Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
    submitted by /u/simandl [link] [comments]  ( 8 min )
    [D] What should I do for training when data to predict has random distribution?
    I was taught that when doing imbalance classification, the training data should be augmented to more or less match the number of classes, but the validation data should have the same distribution as the test data. And the test data should have the similar distribution as the data I will actually predict. But what if real data's distribution is quite random? What validation data distribution should I use? (I got 14 classes to classify, and 1 of classes has 52% proportion, and small ones have 0.9%, and 0.17% proportion. Practitioners who would use my model input data that only 3 classes to classify, and they can be very small proportion. The training data before augmentation was created by integrating data with this irregular distribution.) submitted by /u/poemfordumbs [link] [comments]  ( 9 min )
  • Open

    We hate how "black box" neural nets are, we made a thingy in an attempt to demystify their "thinking."
    submitted by /u/DeltaStarStudiosPR [link] [comments]
    AI ‘breakthrough’: neural net has human-like ability to generalize language
    submitted by /u/nickb [link] [comments]
    [Long read] Deep dive into AutoGPT: A comprehensive and in-depth step-by-step guide to how it works
    https://airt.hashnode.dev/long-read-deep-dive-into-autogpt-a-comprehensive-and-in-depth-step-by-step-guide-to-how-it-works submitted by /u/Harish_Mohanraj [link] [comments]
  • Open

    Trust in AI, Data Poisoning, and Involving People in Maturing AI
    submitted by /u/fookingyeah [link] [comments]
    Baby AGI and AgentGPT : Exploring Autonomous AI-Agents
    submitted by /u/Tao_Dragon [link] [comments]
    How can i use AI to research for my thesis?
    hey all imnewto this can you help me please ? submitted by /u/proptuxiakoskariolis [link] [comments]
    When I use AI to generate Halloween candy wrappers and then print them out...
    submitted by /u/Sea_Permit5660 [link] [comments]
    DTIYS Challenge Submission Sample Art for Oh my Anne
    submitted by /u/Oh_my_Winnie [link] [comments]
    One-Minute Daily AI News 10/24/2023
    OpenAI Executives Sam Altman Say AI Will Be Able to Do Any Job Within 10 Years.[1] Snapdragon 8 Gen 3 chipset officially announced with AI-driven functionalities.[2] Google parent Alphabet reported its third quarter earnings Tuesday, which showed more spending on AI infrastructure and muted cloud growth, culminating into several questions for executives about how all the efforts around artificial intelligence are actually going to turn into real money.[3] Adult film star Riley Reid(I don’t know who she is) launches Clona.AI, a sexting chatbot platform.[4] Sources: [1] https://www.wsj.com/podcasts/the-journal/a-conversation-with-openais-sam-altman-and-mira-murati/7c89e85f-9d7e-4569-b67d-6a777374eada [2] https://headtopics.com/my/snapdragon-8-gen-3-chipset-officially-announced-with-47616340 [3] https://www.nbcdfw.com/news/business/money-report/wall-street-wants-to-know-how-googles-going-to-profit-from-ai/3368989/ [4] https://www.engadget.com/adult-film-star-riley-reid-launches-clonaai-a-sexting-chatbot-platform-000509221.html submitted by /u/Excellent-Target-847 [link] [comments]
    Would majoring in artificial intelligence be worth it?
    The AI boom has made it more relevant than ever, and its applications are truly awe-inspiring. While it’s far from perfect, it has helped me greatly in writing, by generating content to inspire me and my projects. I have a smattering of skills, none that I’d consider especially good enough to double down upon, but learning how to optimize language learning models to produce the most adequate results would be pretty neat. I just don’t know what I want to do with my education, I’ve completed my basics and as such have a blank slate to play with, but I’m worried that whatever I select, it will be no good, and just result in lost time and money. Tertiary education seems like a necessity in the modern world, especially since the job world is more ruthless than ever, and the economy is in ashes. submitted by /u/Niobium_Sage [link] [comments]
    Need to find an Ai
    Which AI does these cartoon? submitted by /u/hommedufuture [link] [comments]
    I've been playing around with Midjourney a little bit and this is what I got.
    ​ https://preview.redd.it/2emqr4z8a9wb1.png?width=928&format=png&auto=webp&s=437547e7e86e23298b7c778cada9863385ce961d PROMT close up of eye, close up of girl eye, mangekyo sharingan, super close up, pretty eye, black and red eye, naruto anime, long eyelashes, anime eye, 2d art eye, --s 180 --style expressive ​ https://preview.redd.it/4dffpg2ca9wb1.png?width=928&format=png&auto=webp&s=fc485a604460f7e544d0490d0bee65f984d8a5b3 PROMT **stained glass, it was meticulously written, picture with elaborate writing, cute girl smile with Rabbit,Flower, bold and strong line drawing, vivid acrylic painting, vivid thick paint, vivid, plain background, beautiful proof, highest resolution 16K, beautiful anime girl that is betrayeded by a Rabbit, hair is short, ferret, Beautiful lightcyan high ligh…
  • Open

    Detection and high-frequency monitoring of methane emission point sources using Amazon SageMaker geospatial capabilities
    Methane (CH4) is a major anthropogenic greenhouse gas that‘s a by-product of oil and gas extraction, coal mining, large-scale animal farming, and waste disposal, among other sources. The global warming potential of CH4 is 86 times that of CO2 and the Intergovernmental Panel on Climate Change (IPCC) estimates that methane is responsible for 30 percent of observed […]  ( 12 min )
  • Open

    Research Focus: Week of October 23, 2023
    In this issue: Kosmos-2.5: A Multimodal Literate Model; Can vine copulas explain complex relationships of weather variables; New system accelerates the adaptive training process; Structural inequalities and relational labor in the influencer industry. The post Research Focus: Week of October 23, 2023 appeared first on Microsoft Research.  ( 10 min )
  • Open

    Next-Gen Neural Networks: NVIDIA Research Announces Array of AI Advancements at NeurIPS
    NVIDIA researchers are collaborating with academic centers worldwide to advance generative AI, robotics and the natural sciences — and more than a dozen of these projects will be shared at NeurIPS, one of the world’s top AI conferences. Set for Dec. 10-16 in New Orleans, NeurIPS brings together experts in generative AI, machine learning, computer Read article >  ( 8 min )
  • Open

    The right to perform RL on games
    Hi all, I'm new to learning RL. I want to train an agent to clear a game such as vampire survivor, super mario brothers, etc, as my first research/project. I talked with my tutor , he reminded me to pay attention to copyright issues and that I needed a permission to use these works for training. I guess I could get permission by asking the game's author directly, but before that, or for games produced by some big companies, where can I find information about the rights? Although reading the game's memory is a challenge for me, it's cool to see a agent clear a game. submitted by /u/Ruine_fff [link] [comments]
    Building Doom with AI enemies
    I'm planning to go down the rabbit hole of using RL to train agents in doom/vizdoom The goal would be to create a version of doom where the enemies have AI and are adaptive. Doom and Doom 2 are some of my all time classic favorites. There are people still making maps to this day! Let me know on what you think about the idea? Project plan - Nov 2023 : RL refresher from the David Silver RL course on YouTube Dec 2023 : start working on openAI and stablebaselines3 and watch Nicholas Renotte's videos Jan 2024 : play around with the Doom WAD and try to see if you can make changes to the doom engine + Training and setting up custom env Feb 2024 : hopefully first level with enemy AI created Mar 2024 : release fully completed open source version of the game Background: I work at a hedge fund, have some basics on reimbursement learning, although it has been a long long time. Time is a bit limited after 12 hours or work and 2 hours of gym (the real human world one) so kinda stretching this out Any suggestions are welcome. Any courses, books, libraries and tools you'd suggest? submitted by /u/Sahil231090 [link] [comments]
    "Surprise" for learning?
    I was recently listening to a TalkRL podcast where Danijar Hafner explains that Minecraft as a learning environment is hard because of sparse rewards (30k steps before finding a diamond). Coincidentally, I was reading a collection neuroscience articles today where surprise or novel events are a major factor in learning and encoding memory. Does anyone know of RL algorithms that learn based on prediction error (i.e. "surprise") in addition to rewards? submitted by /u/CognitoIngeniarius [link] [comments]
  • Open

    Frontier Model Forum updates
    Together with Anthropic, Google, and Microsoft, we’re announcing the new Executive Director of the Frontier Model Forum and a new $10 million AI Safety Fund.  ( 4 min )

  • Open

    A warning about an unknown danger of AI. Current uses of AI have been overwhelmingly positive but there is an unknown danger that I would like to speak to.
    I want to warn AI companies and developers about a danger that is not known about regarding AI. The reason it is not known about regarding AI is that it isn't known about in general and so the AI community can hardly be blamed for that. Unfortunately, the danger here has to do with the fundamental nature of human society and social interaction as it stands at this time. The issue is that there is 'hidden language' used in social communication and unlike typical conceptions of things like body language this is not auxiliary to our rational purposes, rather our rational purposes are auxiliary to the hidden communication. One way of describing it would be that our formal language is a 'carrier wave' to encode other information about our status and the status of others. So our communications …
    AI Psychology Test: What happens in viewers' mind when news segments about important major events shift to commercials where the announcer is talking like a comic character?
    When news segments covering major, often serious, events abruptly switch to lighthearted or comical commercials, a cognitive dissonance can occur in the viewer. Here's why: news programs are designed to engage the viewer's analytical faculties. They present facts, figures, and expert opinions, demanding cognitive effort to understand the implications. The viewer is in a "serious" mode, applying critical thinking to absorb the information. Commercials, particularly the comic ones, often aim for emotional engagement rather than intellectual analysis. They use humor, catchy jingles, and attractive visuals to create a positive association with the product being advertised. When the transition between these two contrasting tones is sudden, the viewer has to perform a rapid mental shift from analytical to emotional engagement. This can be jarring. This dissonance can have a few different outcomes. For one, it might diminish the impact of both the news segment and the commercial. The viewer might find it difficult to fully engage with either, as the cognitive "gear shifting" can be distracting. Secondly, this dissonance can potentially undermine the gravitas of the news. When sandwiched between comic commercials, serious topics might lose some of their perceived importance. Lastly, it can make the commercial less effective. The viewer, still in a serious mindset, may not be as receptive to the emotional triggers that the commercial aims to pull. So, in essence, this rapid shift can dilute the efficacy and impact of both the news and the advertising, while causing cognitive friction for the viewer. CGPT-4 submitted by /u/Georgeo57 [link] [comments]
    Any good AI-integrated video games?
    Does anybody know of any good AI integrated games that have been released or are in beta? I'm really interested to see how people have incorporated the current boom in AI into game design. submitted by /u/Rfallmann [link] [comments]
    Managing AI Risks in an Era of Rapid Progress
    The rapid progress of AI development brings both opportunities and risks. While AI systems have the potential to cure diseases and elevate living standards, they also pose large-scale risks that we are not prepared to handle. Without proper safety measures and ethical considerations, advanced AI systems could amplify social injustice, erode social stability, and enable criminal activities. The development of highly advanced autonomous AI systems also raises concerns about the pursuit of undesirable goals and the loss of human control. To ensure a positive outcome, research breakthroughs in AI safety and ethics are needed, along with effective government oversight. Source : https://managing-ai-risks.com/ submitted by /u/NuseAI [link] [comments]
    Deepfakes Just Got Very Real
    Interesting read about deepfakes that started with a Reddit post. https://www.linkedin.com/pulse/deepfakes-just-got-very-real-scott-clark-sfurc submitted by /u/scottimherenowwhat [link] [comments]
    How AI could change Google search and wipe out $68 billion SEO industry | Fortune
    Oh well 🤷‍♂️ submitted by /u/AminoOxi [link] [comments]
    🦾ERNIE 4.0 vs GPT-4, Tightened AI Chip Restrictions, Alibaba Tencent Fund AI Startup, and China's Global AI Governance Initiative
    submitted by /u/trcytony [link] [comments]
    Stanford AI Conference - New Horizons in Generative AI: Science, Creativity, and Society - Livestreaming Now
    submitted by /u/Nice-Inflation-1207 [link] [comments]
    Dancing with Light: A Hummingbird's Enchanted Encounter.
    submitted by /u/IllustriousVideo6145 [link] [comments]
    150+ Awesome ''Act As'' ChatGPT Prompts
    submitted by /u/Senior_tasteey [link] [comments]
    ChatGPT, invent comics for robots.
    submitted by /u/Philipp [link] [comments]
    An A.I. video interpretation of "Metamorphosis Two" by Philip Glass
    submitted by /u/AnimalsChasingCars [link] [comments]
    I have a question
    What’s the best voice ai for song covers? Like I wanna do someone like Donald Trump, Cartman, Ice King/Simon singing The Boys (Eng Ver) by SNSD. Also it has to be free! submitted by /u/Ok-Upstairs-9887 [link] [comments]
    Apple and AI
    Apple has been behind in the AI field compared to companies like OpenAI, Google, Microsoft, and Amazon. While Apple has made improvements in autocorrect and AI features in Photos, it needs to catch up to remain competitive. Apple executives have been scrambling to make up for lost time and have been working on generative AI technology. There is anxiety within Apple about whether their AI/ML team can deliver. Source : https://daringfireball.net/2023/10/apple_and_ai submitted by /u/NuseAI [link] [comments]
    🚀 Gaming with ChatGPT using Encrypted Prompts and Prompt Injection! 🎮
    submitted by /u/Gloomy_Recognition_4 [link] [comments]
    How are neobanks utilizing AI to offer more accurate and personalized financial advice to customers?
    Your answers are appreciated. submitted by /u/Cygnet-Digital [link] [comments]
    One-Minute Daily AI News 10/23/2023
    The U.S. Senate will hold the second in a series of bipartisan AI Insight Forums on Tuesday, Oct. 24, where senators will hear from some of the most influential tech leaders to help inform regulations around the technology.[1] Microsoft announces A$5 billion investment in computing capacity and capability to help Australia seize the AI era.[2] Samsung is going all in with the AI performance of the Galaxy S24 phones.[3] Reddit has reportedly decided to block AI startups from scraping data from its website. This move prevents third-party companies from using Reddit’s data to train their machine-learning models without permission.[4] Sources: [1] https://news.asu.edu/20231020-government-calling-tech-leaders-help-crafting-artificial-intelligence-legislation [2] https://news.microsoft.com/en-au/features/microsoft-announces-a5-billion-investment-in-computing-capacity-and-capability-to-help-australia-seize-the-ai-era/ [3] https://www.androidheadlines.com/2023/10/samsung-galaxy-s24-smartest-ai-phone.html [4] https://www.androidheadlines.com/2023/10/reddit-block-ai-startups-scraping-data.html submitted by /u/Excellent-Target-847 [link] [comments]
    オレの攻撃からお前は逃れられぬ。 いかなる人間も、死という現実から決して逃れられぬように。 受け入れることだ。定めよ。
    submitted by /u/nicdunz [link] [comments]
    Anti deepfake headset V2
    You can find out more here in the comments submitted by /u/ahauss [link] [comments]
  • Open

    [D] How should I calculate the weights for a multi-label classification task where the labels are dependent among one another?
    I'm not sure if I worded the title correctly. Let me elaborate on the scenario. I have a multi-label image classification task where I'm trying to classify the gender of clothing images. The two labels that we can predict are Male and Female, hence the final logit vector's size would be something like [batch_size, 2]. Depending on the predictions, we're mapping the following binary values to different categorical values: [0, 0]: Unknown [0, 1]: Male [1, 0]: Female [1, 1]: Unisex The overall distribution is heavily imbalanced with Male being the minority class. I'm trying to calculate class weights to favor Male, but the problem is that the size of the weight tensors to be provided to the loss function should have a length of 2. I say this is a problem because although the number of prediction logits is 2, the actual number of classes is 4. I used the word "dependent" in my title because, for example, [1, 1] wouldn't necessarily mean that the image has the labels Male and Female, rather that it's a completely new Unisex label. Again, not sure if the usage of the word is appropriate. Anyway I've thought of making a custom loss function to first map the binary labels to their respective categorical values, but am wondering if there is any other way to go about this. submitted by /u/Seankala [link] [comments]  ( 9 min )
    [D] LSTM: Train & Val losses not converging
    I am training an LSTM model for path prediction where I'm feeding in OBT (on-board Time) and X matrix as input and Y matrix is the predecessor matrix generated using Scipy.Dijkstra ​ This is the model architecture for reference, This is the model architecture for reference, I've tried multiple iterations of this similar model, but the training and validation loss, are not converging. The best train_loss i've been able to achieve is 88k mse and 400 mse val_loss I've uploaded the dataset here: GitHub - mathur-exe/LSTM_Dataset Training Progress: Epoch 1/100 342/342 - 17s - loss: 22606898.0000 - val_loss: 61414736.0000 - 17s/epoch - 49ms/step Epoch 2/100 342/342 - 14s - loss: 7990657.0000 - val_loss: 3699703.5000 - 14s/epoch - 40ms/step Epoch 3/100 342/342 - 13s - loss: 4130298.7500 - val_loss: 136808.1094 - 13s/epoch - 38ms/step Epoch 4/100 342/342 - 12s - loss: 2747299.2500 - val_loss: 35710.1680 - 12s/epoch - 35ms/step Epoch 5/100 342/342 - 12s - loss: 2558378.2500 - val_loss: 3383.4780 - 12s/epoch - 36ms/step Epoch 6/100 342/342 - 13s - loss: 1214455.8750 - val_loss: 111625.2891 - 13s/epoch - 37ms/step Epoch 7/100 342/342 - 19s - loss: 337480.2500 - val_loss: 68686.6094 - 19s/epoch - 55ms/step Epoch 8/100 342/342 - 15s - loss: 316366.7188 - val_loss: 2059.3557 - 15s/epoch - 44ms/step Epoch 9/100 342/342 - 20s - loss: 293117.0312 - val_loss: 20961.5469 - 20s/epoch - 58ms/step Epoch 10/100 342/342 - 17s - loss: 575945.1875 - val_loss: 503602.8438 - 17s/epoch - 50ms/step Epoch 11/100 342/342 - 13s - loss: 290962.8750 - val_loss: 62491.9375 - 13s/epoch - 37ms/step Epoch 12/100 342/342 - 12s - loss: 1125042.5000 - val_loss: 36054.6836 - 12s/epoch - 36ms/step Epoch 13/100 ... 342/342 - 16s - loss: 230900.7969 - val_loss: 48309.6094 - 16s/epoch - 47ms/step Epoch 93/100 342/342 - 23s - loss: 232846.6094 - val_loss: 82926.6875 - 23s/epoch - 67ms/step submitted by /u/Gaurang_Mathur_ftw [link] [comments]  ( 9 min )
    [D] Will ChatGPT remove the need for data annotation?
    I wrote a blog post about this detailing my experience, which I will attach at the bottom but I want to hear opinions of people. It is something I've actively been thinking about, and would like to know potential pitfalls and why it may not work, rather than the huge promise it holds. https://ozanciga.wordpress.com/2023/10/24/will-chatgpt-remove-the-need-for-data-annotation/ submitted by /u/ozanciga [link] [comments]  ( 9 min )
    [R] Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
    submitted by /u/hzj5790 [link] [comments]  ( 9 min )
    [D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset?
    I'm using Top2Vec with Doc2Vec embeddings to find topics in a dataset of ~4000 social media posts. This dataset has three groups: Posts from a company (3%) Posts from this company's potential customers (82.5%) Posts from this company's competitors (14%) The purpose of this analysis is to look at the topics the company is posting about on social media and see how it compares to the things that their customers and competitors are posting about. Since the values of Doc2Vec embeddings depend on the other documents in the dataset, I'm worried that topics in smaller groups are going to be drowned out by the larger group. I'm worried that the differences between the document vectors in the smaller group are going to be made smaller by presence of the documents from the larger group, which may represent a much wider array of different topics. submitted by /u/abelEngineer [link] [comments]  ( 9 min )
    [D] How would you do it? Handling multi-turn QA conversation with matching of questions to vector database.
    I have been giving this some thought and would appreciate some outside input, maybe someone has some experience they could share! I am attempting to create a QA chatbot that is limited to answering questions from a pre-determined set of question and answer pairs I have in a vector database. Currently I create embeddings of the question using OpenAI and query a vector database for similar "reference question" - if the similarity score is high enough I proceed and use the answer text I have stored in the metdata as "context" for the answer generation. I would now like to extend this to include conversational history. The issue I am facing however, is that a follow-on question may not hit the similarity threshold. Considering a follow-up question would typically not be worded in a way that …  ( 10 min )
    [P] The ML Practitioner, a publication about all things machine learning and MLOps
    Hi all, my wife and I have recently started a new publication called The ML Practioner. If you're interested in writing for us, please send us a link of your unpublished draft here. Either way, please subscribe to us if you're interested in this kind of content! submitted by /u/kanxx030 [link] [comments]  ( 9 min )
    [D] efficacy of cold start preferences on recs systems
    Hi all, Are there good papers about the efficacy of cold start explicit preference collection (think Netflix “pick some movies you like”) on the recs systems? I haven’t been able to find any so far. One key aspect I’m looking for is if these are effective, how long they are relative to just implicit actions the user takes. Thanks submitted by /u/steathilynecessary [link] [comments]  ( 9 min )
    [D] Embedding models ranked by encode speed?
    Hello, the sbert.net has a list where you can sort by encode speed but its a very small subset of the HuggingFace MTEB leaderboard. AFAICT, the HuggingFace leaderboard / model pages don't have this information. Is there a list where I can see a more up-to-date list of models by encoding speed? submitted by /u/rsamrat [link] [comments]  ( 9 min )
    [D] Finite State Transducers and language productivity
    In the context of NLP, will language models based on finite state transducers (since they are finite) ultimately fail to put language's productive nature to good use? All the possible outputs a finite state transducer can produce are predictable, while all the possible outputs a given natural language can produce are much less predictable? submitted by /u/RecordingOk5720 [link] [comments]  ( 9 min )
    [P] Equinox KV Cache
    I've been trying to implement a kv cache in my language model but have been unsuccessful so far due to the dynamic shapes. I've seen some implementations in flax but was wondering if it was possible to implement in equinox as that's what I'm using and prefer over others like flax. If anyone can point me in the right direction or help with the implementation that would be great! PS: I can provide any code if wanted to help submitted by /u/Additional-Ad-7043 [link] [comments]  ( 9 min )
    Explainable Boosting Machine Local and Global Explanation plots label size [D]
    I am using EBM for a research, the local and global explanation plots it produces come with preset font size, I want to change the resolution of the figure and the font size of labels and x and y ticks in the explanation plots. I have looked for it on the InterpretML github page and issues and scrolled through various webpages but haven't found anything helpful. Used gpt but it doesnot help either, it tries to use matplotlib but EBM plots are not compatible with it. Please share any way it can be solved, because the plots labels are unreadable in the article if used as it is. submitted by /u/Horseman099 [link] [comments]  ( 9 min )
    [D][P] What is the metric for early stopping in YOLOv8 detection?
    I am trying to fine tune the yolov8 detection model an was going through the code base of ultralytics.I found this piece of code in the engine.trainer # Early Stopping if RANK != -1: # if DDP training broadcast_list = [self.stop if RANK == 0 else None] dist.broadcast_object_list(broadcast_list, 0) # broadcast 'stop' to all ranks if RANK != 0: self.stop = broadcast_list[0] if self.stop: break # must break all DDP ranks I'm familiar with how the early stopping works and not sure what they are doing here does this get invoked by default?? what is the metric that they use in order to stop it?? upon further inspection i found this self.stopper, self.stop = EarlyStopping(patience=self.args.patience), False which is imported as from ultralytics.utils.torch_utils import (EarlyStopping, ModelEMA, de_parallel, init_seeds, one_cycle, select_device, strip_optimizer) please help me find out what metric they use to stop this and if the earlystopping is invoked by default submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] The N Implementation Details of RLHF with PPO
    We are happy to share a great repro of OpenAI's early RLHF codebase, with nearly identical learning curves. We also summarized implementation details (did you know Adam Optim's implementation details could impact RLHF?) 📜 Blog post:https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo 💾 Code: https://github.com/vwxyzjn/lm-human-preference-details submitted by /u/vwxyzjn [link] [comments]  ( 9 min )
    ML [project] [p]
    What are best ways to collect database for any ml project submitted by /u/GingSkywalker [link] [comments]  ( 8 min )
    [R] Feature Space Reduction Method for Ultrahigh-Dimensional, Multiclass Data: RFMS
    We are excited to announce the publication of our groundbreaking scientific paper in Machine Learning: Science and Technology titled “Feature Space Reduction Method for Ultrahigh-Dimensional, Multiclass Data: Random Forest-Based Multiround Screening (RFMS)” by Gergely Hanczar, Marcell Stippinger, David Hanak, Marcell T Kurbucz, Oliver M Torteli, Agnes Chripko, and Zoltan Somogyvari. Published on: 19 October 2023 DOI: 10.1088/2632-2153/ad020e Volume 4, Number 4 In recent years, several screening methods have been published for ultrahigh-dimensional data that contain hundreds of thousands of features, many of which are irrelevant or redundant. However, most of these methods cannot handle data with thousands of classes. Prediction models built to authenticate users based on multichannel biometric data result in this type of problem. In this study, we present a novel method known as random forest-based multiround screening (RFMS) that can be effectively applied under such circumstances. The proposed algorithm divides the feature space into small subsets and executes a series of partial model builds. These partial models are used to implement tournament-based sorting and the selection of features based on their importance. This algorithm successfully filters irrelevant features and discovers binary and higher-order feature interactions. To benchmark RFMS, a synthetic biometric feature space generator known as BiometricBlender is employed. Based on the results, the RFMS is on par with industry-standard feature screening methods while possessing many advantages. r/IAMA - Oct 26 with the founders of Cursor Insight. https://bit.ly/AMAwithCursorInsight-GoogleCalendar ​ R/IAMA - Oct 26 with the founders of Cursor Insight. submitted by /u/CursorInsight [link] [comments]  ( 9 min )
    [N] New letter from Yoshua Bengio, Geoffrey Hinton, and others: Managing AI Risks in an Era of Rapid Progress
    Signatories include Turing Award winners Yoshua Bengio, Geoffrey Hinton, as well as others academics and experts. In 2019, GPT-2 could not reliably count to ten. Only four years later, deep learning systems can write software, generate photorealistic scenes on demand, advise on intellectual topics, and combine language and image processing to steer robots. As AI developers scale these systems, unforeseen abilities and behaviors emerge spontaneously without explicit programming1. Progress in AI has been swift and, to many, surprising. The pace of progress may surprise us again. Current deep learning systems still lack important capabilities and we do not know how long it will take to develop them. However, companies are engaged in a race to create generalist AI systems that match or ex…  ( 10 min )
    [D] Generative Food
    Hey guys, I sometimes post about tiny ML projects we work on. This time, we talk about applying language models for generating recipe titles/ideas. Specifically, we don't use LLMs, and this turns out to be a bit of a controversial decision, but one that has it's own advantages. Quite interested in the community's take on it: https://engineering.hellofresh.com/recipes-and-generative-ai-6d74a107860c submitted by /u/abnormdist [link] [comments]  ( 9 min )
    [R] Tokenizer Choice For LLM Training: Negligible or Crucial?
    📷Research https://arxiv.org/abs/2310.08754 While the recent success of LLMs has been driven primarily by curation of training dataset composition, scaling of model architectures and dataset sizes, and advances in pretraining features, the impact of tokenizers has often lagged as a blind spot. Our researcher*s study sheds light on this issue and shows that tokenizer choice can significantly impact downstream model performance as well as training and inference costs. 1️⃣ Investigation of intrinsic tokenizer performance, i.e., study of tokenizer properties (i.e., generated vocabulary), and tokenization results of tokenizers. 2️⃣ Investigate the extrinsic performance of the tokenizer, i.e., the impact of the tokenizer on the downstream performance of the model. 3️⃣ Investigation of possible correlation between intrinsic and extrinsic tokenizer performance. ​ 💡 The investigation shows that the common tokenizer evaluation metrics "fertility" and "parity" do not always predict the performance of the downstream model, making these metrics a questionable criterion for tokenizer evaluation. 💡 Moreover, the study shows that multilingual tokenizers - which are based on the five most common European languages - require a vocabulary size by a factor of three compared to English. The previous approach of training tokenizers with English vocabulary only thus turns out to be inefficient and results in a strong performance degradation and additional training costs of up to 68% submitted by /u/effi28_ml [link] [comments]  ( 9 min )
    [R] Using Machine Learning to Drive Portfolio Asset Allocations
    I'd love to hear your guys thoughts on next steps to improve this, maybe deeper layers and more nodes, maybe a random forest is more appropriate? I'd love to hear any thoughts on Machine Learning directly applicable to time-series data. https://www.quantitativefinancialadvisory.com/post/asset-allocation-in-a-post-modern-portfolio-theory-world-part-1-the-single-layer-taarp-ml-model The Main Idea We will develop a Machine Learning model, specifically a deep learning model (more hidden layers to come), to periodically, tactically rebalance the weights of our portfolio based on observable market data and empirically determined statistics combined with feature engineering from the past 21 trading days, and for the VIX we consider its characteristics since inception. The output will be a range representing the degree to which we bet long, short, or hold cash, and 3 weights that sum to less than or equal to one and greater than or equal to negative one. In essence we will allow shorting of securities and not require our portfolio to be fully invested. Cash is an active position; sometimes the best investment is staying on the sidelines. The model will allow one input layer, one and two hidden layers (to show that more might not always be better, explicitly with the 200 variable maximum excel solver imposes on us), and an output layer with 3 nodes outputting a value between -1 and +1 with -1 representing a full allocation to a short position in the security and +1 representing a fully allocated long position. submitted by /u/QFA_official [link] [comments]  ( 9 min )
    [D] Are people in ML Phds still happy?
    As an outsider who has many friends in ML Phds, this is my perspective of their lives: long hours, working nights, weekends no work-life balance, constant fear of being scooped and time pressure from deadlines frustrating broken review systems many incremental, advertisement papers that produce very little actual contribution (which is justified by 2.) "engineering" and not "science" all this pressure amounts to severe imposter syndrome Are people in the field still happy? Where do people get their satisfaction? To me it looks like almost like a religion or a cult. The select few who say, get neurips outstanding paper are promoted to stardom - almost a celebrity status while everyone else suffers a punishing work cycle. Are the phd students all banking on AGI? What else motivates them? Edit: the discussion is about whether 1-6 are worse in ML than other fields (or even the median experience). The reference for "other field" is highly heterogenous. Experience obviously varies by lab, and then even by individuals within labs. "It happens in other fields too" is a trivial statement - of course some version of 1-6 affects somebody in another field. Edit 2: small n but summarizing the comments - experience seems to differ based on geographic region, one's expectations for the phd, ability to exert work-life balance, and to some extent ignore the trends others are all following. Some people have resonated with problems 1-6, yet others have presented their own, anecdotal solutions. I recommend reading comments from those who claim to have solutions. submitted by /u/shenkev [link] [comments]  ( 9 min )
    [P] A PDF tool that supports three retrieval strategies, allowing users to choose the answer that suits them best
    ➡️ Check on https://huggingface.co/spaces/xuyingliKepler/VecDBCompare 📌 Introduction: VecDBCompare is a streamlit-based application designed to evaluate and compare three different vector database retrieval strategies. Users only need to upload a PDF and interact with QABots using three different strategies to determine which strategy is most suitable for them. ⭐️ Three retrieval strategies: Chunk Strategy: Divides the document into small chunks and retrieves based on the most relevant chunks. Summary Strategy: Summarizes the document and retrieves based on the summary content. Hypothetical Question Strategy: Generates hypothetical questions that the document might answer and retrieves based on these questions. submitted by /u/xuying_li [link] [comments]
    [D] [P] 3D Design file labelling and classification for manufacturing
    I have ~1 million 3D design (.STP and/or .OBJ) files of various parts for medical devices, aerospace, automotive or defense systems. I'd like to label them based on appropriate manufacturing methods that are used to physically make them. Some example methods and labels would be milling, turning, injection molding, cnc machining, etc. After labelling, I'd like to architect a system to produce these labels as inference for a new part that has not been physically made yet. My team (<5 people) have manufacturing domain expertise and can manually label these parts but I'm looking for a more scalable solution that isn't as time consuming. Crowd sourced methods like Mechanical Turk won't work because annotators do not have the domain knowledge to mark the correct label. Labelling platforms like SageMaker/Azure ML Studio only allow image/text/audio datasets, is there a platform that'll help me setup labelling tasks for 3D designs? Furthermore, how can I find more experts that can help scale this up? It seems to me that the only option is to build my own labelling app as an annotator needs these key features - 3D model visualizer so they can spin the part and view any orientation Draw a bounding box (commonly available in other platforms) Toggle measurements in inches/mm As for label classification I'm looking at architectures like PointNet since my dataset of meshes can be sampled to point clouds. Are there other methods that would work better or worth exploring? Open to any and all suggestions across this pipeline. ​ ​ submitted by /u/rootcage [link] [comments]  ( 9 min )
    [D] Undergrad seeking advice on ethics/ML research
    I’m an undergraduate who’s considering a PhD student in ML. I’m currently in a lab that focuses on ethics in AI. While I love the work, it focuses on the humanities side of CS. I’ve always been a more mathy person and have always been interested in theoretical ML research. I’d like to combine ethics & AI/ML in some way (eg studying explainable AI from the technical perspective). I was wondering what are some research areas that combine the two and if I don’t work in academia, what’s the market and job prospects like for someone who does this? submitted by /u/SnooChipmunks1902 [link] [comments]  ( 9 min )
  • Open

    Rewards in Montezuma's Revenge
    Hello all, I'm working on Montezuma's Revenge using the Gymnasium API. I wonder if there's anyone here that knows the numerical value of the rewards? And if so, how they are typically scaled down. ​ Thanks! ​ G_bes submitted by /u/G_bes [link] [comments]
    The N Implementation Details of RLHF with PPO
    submitted by /u/vwxyzjn [link] [comments]
    Creating a Custom Environment in Unreal Engine 5
    Hello, I would like to create my own environment (Maze), in which I would like to train my drone using reinforcement learning, I am kind of new and I don't know how can I set the state space, rewards, and if I would like to use BS3 for training then how can I connect the environment? And for the agent which is the drone, should i just do the AirSim build.cmd and take the agent from there and place the starting position flag or what? I am a bit lost and I can't find tutorials on how to do this, I'd appreciate it if you could provide some guidance. Thanks in advance. submitted by /u/Gabii99 [link] [comments]
  • Open

    Intelligent document processing with Amazon Textract, Amazon Bedrock, and LangChain
    In today’s information age, the vast volumes of data housed in countless documents present both a challenge and an opportunity for businesses. Traditional document processing methods often fall short in efficiency and accuracy, leaving room for innovation, cost-efficiency, and optimizations. Document processing has witnessed significant advancements with the advent of Intelligent Document Processing (IDP). With […]  ( 20 min )
    T-Mobile US, Inc. uses artificial intelligence through Amazon Transcribe and Amazon Translate to deliver voicemail in the language of their customers’ choice
    This post is co-authored by Dhurjati Brahma, Senior Systems Architect at T-Mobile US, Inc and Jim Chao, Principal Engineer/Architect at T-Mobile US, Inc and Nicholas Zellerhoff Associate Systems Architect at T-Mobile US, Inc. T-Mobile US, Inc. provides a Voicemail to Text service to its customers, which allows customers to quickly read through their voicemails and […]  ( 7 min )
  • Open

    DSC Weekly 24 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 24 October 2023 appeared first on Data Science Central.  ( 21 min )
    Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques
    Written by Venkata Nori and Kshitij Gopali. Introduction As technology is evolving, most companies in the world are adopting advanced mechanisms for their daily tasks of storing/updating data, project management & tracking, incident management, version control, etc. Periodically, these companies’ business stakeholders would want to extract and analyze the data to see how the business… Read More »Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques The post Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques appeared first on Data Science Central.  ( 25 min )
    How data science and medical device cybersecurity cross paths to protect patients and enhance healthcare
    A recent interview by Medical Device Network with GlobalData medical analyst Alexandra Murdoch shares interesting insights into cybersecurity for medical devices. The post How data science and medical device cybersecurity cross paths to protect patients and enhance healthcare appeared first on Data Science Central.  ( 22 min )
    Skills required to excel in a business analytics career
    In the contemporary business landscape, where data is heralded as the new oil, Business Analytics has emerged as a pivotal domain, steering organizations towards informed decision-making and strategic planning. business analytics encompasses the utilization of data, statistical algorithms, and machine learning techniques to comprehend the business context, forecast future trends, and facilitate optimal decision-making. The… Read More »Skills required to excel in a business analytics career The post Skills required to excel in a business analytics career appeared first on Data Science Central.  ( 22 min )
    GenAI: The game-changer in data analytics
    In an era where data drives decisions, GenAI emerges as a prodigy force in the realm of data analytics. According to Statista, LLM’s market size is expected to show an annual growth rate of 24%, resulting in a market volume of $207 bn by the end of 2030.  This cutting-edge technology, built on sophisticated algorithms… Read More »GenAI: The game-changer in data analytics The post GenAI: The game-changer in data analytics appeared first on Data Science Central.  ( 22 min )
  • Open

    Animated AI
    submitted by /u/nickb [link] [comments]
  • Open

    Best of N series
    A couple days ago I wrote about the likelihood of the better team winning a best-of-five or best-of-seven series. That is, if the probability of X winning a game against Y is p > ½, how likely is it that X will win a majority of 5 games or a majority of 7 games. This […] Best of N series first appeared on John D. Cook.  ( 6 min )
    Lessons from Skylab
    I discovered the Space Rocket History Podcast a while back and listened to all the episodes on the Apollo program. I’m now listening to the episodes on Skylab as they come out. I came for Apollo; I stayed for Skylab. I would not have sought out the episodes on Skylab, and that would have been […] Lessons from Skylab first appeared on John D. Cook.  ( 6 min )
    Curvature: principal, Gauss, and mean
    This post will compute the center of curvature for an object described in the previous post. In order to do that we first need to describe principle curvature and Gauss curvature, and we’ll throw in mean curvature while we’re at it. Let S be a surface sitting in three dimensional space. No need for more […] Curvature: principal, Gauss, and mean first appeared on John D. Cook.  ( 6 min )
    An algebraic superegg
    One year ago I wrote about a variant of the squircle that is quantitatively close to the customary definition but that has nicer algebraic properties. That post used the term p-squircle for the usual squircle with equation where p > 2, and the term s-squircle for the variation with equation where s is between 0 […] An algebraic superegg first appeared on John D. Cook.  ( 5 min )
  • Open

    On Razer’s Edge: VFX Star Surfaced Studio Creates Stunning Sci-Fi World This Week ‘In The NVIDIA Studio’
    Visual effects artist Surfaced Studio returns to 'In the NVIDIA Studio' to share his real-world VFX project, created on a brand new Razer Blade 16 Mercury Edition laptop powered by GeForce RTX 4080 graphics.  ( 8 min )

  • Open

    [P] Traffic signs in ecognition developer
    Hi community, first time posting here. I'm working on a project for the segmentation and classification of traffic signs using eCognition Developer software. I need help with creating scripts to apply three classifiers: Naive Bayes, SVM, and Random Forest. I'd like to know how I can implement these classifiers in eCognition Developer and where to insert the scripts in the software. Does anyone have experience with this software and could share script examples or provide guidance on how to accomplish this task? Sorry English is not my first language. Tldr, i need to include the Bayes classifiers, Random Tree, and SVM in eCognition Developer (for segmentation and classification - prediction). submitted by /u/Dignai [link] [comments]  ( 9 min )
    [P] Using gpt4docstrings to generate docstrings for entire projects
    gpt4docstrings is a Python library that allows you to write docstrings for functions / classes non documented in your codebase. In this case, I'm applying the library to one module of langchain to see the results. Repo: https://github.com/MichaelisTrofficus/gpt4docstrings https://i.redd.it/78f3wit071wb1.gif submitted by /u/Hefty-Consequence443 [link] [comments]  ( 9 min )
    [P] DQN with a binary vector as output
    Heey everyone! I hope you're doing well. I need your help guys. I'm working on a DQN that outputs a binary vector of length L (I just applied sigmoid function on the ouptut layer and take p>0.5 as 1 and 0 otherwise). In this setting, at each decision time, the agent returns a list containing the indices of selected elements. Knowing that the list's length is dynamic how can I train my DQN ? (I am facing issues in this). Is there any alternative way to do this purpose (like DDPG :/ )? submitted by /u/GuavaAgreeable208 [link] [comments]  ( 9 min )
    [Project] Looking for AI/ML engineers to team up for a fallow deer identification project
    Hi, first of all, sorry for the cross post, but I guess Huggingface forums were not the right place to begin with and it took me a while to find out where things about AI/ML are being actively discussed. I am a professional software developer (C, Python on Linux) and while I did try out a few things with PyTorch and Diffusers - I am not an ML engineer, so I am looking for someone with ML expertise who’d be interested to team up for a non commercial open source project. I can do quite a lot around application development, but I clearly lack the required ML knowledge. I followed the free MIT ML courses on YouTube, did some reading, tried things out, but the ML part of this project is for sure over my head. So, here’s what I have in mind: I would like to create an application which would b…  ( 11 min )
    [D] Using SQL to monitor ML models
    Hello, We are running a number of machine learning models in production and would like to monitor some metrics during inference: Data quality, inference time, accuracy, etc. All these metrics could be recorded in the python code and we are planning to build a SQL database that will receive all the information so as we can visualize in grafana. Do you think this is a good pattern? What would you suggest instead (we are using AWS). Thank you in advance. submitted by /u/Eddas123 [link] [comments]  ( 9 min )
    [R][P] Trying to understand the generative properties of autoencoders
    A while back, I came across the "From Variational to Deterministic Autoencoders", which provided a novel insight into the generative properties of autoencoders by framing the objective through the lens of regularization. However, I couldn't help but notice that the deterministic models studied felt incomplete, namely due to the inherent lack of sampling in those models (which is something that the authors acknowledge). To provide a short recap of the paper, the authors surgically decompose the variational autoencoder objective into a deterministic one. They start with a Constant-Variance VAE, which is a special case of the general Gaussian latent VAE where the noise standard deviation of the latent distribution is fixed to 1. This leads to what is essentially a standard autoencoder with t…  ( 10 min )
    [D] What is the lowest possible loss for a language model?
    Example: Suppose a character-level language model (three input letters to predict the next one), trained on a dataset that contains three instances of the sequence aei, with two occurrences preceding o and one preceding u, i.e., the dataset is: Input Output aei o aei u aei o In this case, the ideal probability distribution for the model's logits for aei would be ~0.66 for o, ~0.33 for u, and zero for other letters. In other words, when the model is input with aei, the ideal softmax of the logits would be ~0.66 for o, ~0.33 for u, and zero for other letters. Following this reasoning, the objective is to optimize the model's output for a given input to match the distribution of occurrences in the dataset. If this reasoning is correct, then we have the following ideal loss (cross-entropy): https://preview.redd.it/pzpxogcqd0wb1.png?width=330&format=png&auto=webp&s=b0b6c3b5fbfb4797c11a1f26375065ce883551d3 Thus, ~0.63 is the smallest loss we can get with this dataset. Is my reasoning correct? submitted by /u/viniciusarruda [link] [comments]  ( 9 min )
    [D] Tanh activation function outputs the same value for any given input
    Basically im working on the DDPG algorithm in DRL where i have an actor and critic networks. The actor network architecture is quite simple: Input layer contains 22 neurons that represents the state values (ranging from 0.1 to 10.0 max not normalizing them) Two hidden layers with 128 neurons, with Leaky Relu activation (alpha = 0.01), and with HeUniform kernel initialzer Output layer with a single neuron has tanh activation, using Glorot kernel initialzer The critic network has the same architecture but we only concatenate the 22 state values with the action produced by the actor, the only difference is the ouput of the critic has no activation. And both networks use Adam. The problem arises when the training starts because i run a few steps without actually start the learning, but when the learning starts, the actor converges quickly to output values 1 or -1 afor any given input. I tried many learning rates for both actor and critic. One thing to note is when i set the actor learning rate to 1e-5 and the critic to 1e-3 the networks sometimes converges quickly, some time it takes longer to converge and sometimes it does not converge. submitted by /u/Desert_champion [link] [comments]
    [P] Fine-tuning VAEs on limited data
    I have been looking for a pre-trained VAE (on Imagenet with ResNet/VGG) or similar which I could fine-tune on my smaller dataset. However, not only there does not exist many such pre-trained weights but the practice of fine-tuning VAEs does not really seem mainstream. Is there a reason why VAEs are not pre-trained/fine-tuned? Does it have to do with posterior collapse? submitted by /u/unholy_sanchit [link] [comments]  ( 9 min )
    [D] Smart pooling for Visual Transformers
    There is an architecture for images/videos called MViT, where 2D MaxPooling layers are added to reduce computations for ViT. But MaxPooling has a drawback - it discards information independently of context, equally discarding information from both important and uninformative parts of the image. For traditional Conv2D networks, there's little we can do about this, but for transformers, we can reduce dimensionality in a more meaningful way - discarding only those elements that don't carry unique information. Are there any articles/developments on this topic already? submitted by /u/Dependent_Bluejay_45 [link] [comments]  ( 9 min )
    "[Research]" RVC AI Training
    Hello, I'm currently using RVC AI, and I'm about to record myself for the training. What is the best way to record myself except the singing and talking at least for 15 minutes like the guide says. Do I have to make it 20 min and one audio file or do I have to make it 20 min and maybe 10 files with 2 minutes each file? Also, can I multiply my files and reach the 15-20 minutes of audio that it's required or I have to make a different talking or singing for every audio? submitted by /u/WeldFrenzy [link] [comments]  ( 9 min )
    [D] RAG oriented fine-tune... Searching for coherence
    Still searching for a model that is well enough to make RAG... Lots of good models on huggingface, but none of them is trained to return extracted text or answers based on provided info without hallucinating something. Is quite frustrating, every week came a new version of a model that is amazing for Role play and storytelling... (some good progress also on coding...) I see lots of efforts in different RAG strategy, improving semantic search and Chunking, but the open source community still does not have a decent model fine tuned for that. I have considered the idea of make that fine tune, based on synthetic data (using Wikipedia as knowledge base), but unfortunately I have not enough funds to cover the api cost neighter to pay for some decent Gpu. I'm not going to train a 7B Model because the under 30B imho doesn't have many sense if the coherence is the main requirements. Unpopular opinion: as coherence, code llama 34B is much better to any of the 70B fine tune. Sorry to everyone for the rant... Does anyone have some tips or suggestions? Thanks in advance! Edit: My database is composed mainly by abstracts of papers and medical textbook. I admit that the domain is quite complex, but the error rate is too high. Obviously that even if prompted to avoid that (tried and refined multiple prompts, using different prompt format). Gpt3.5, Claude instant and Palm2-Bizon work fine for that task. (obviously GPT4 and Claude 2 would be best, but too expensive for me) I spent lots of time to make a solid embedding pipeline: advanced chunking, Metadata added by llm, text for similarity search different from text provided to LLM, instructor bi encoder to generate embeddings(INSTRUCTOR-XL), reranking using cross encoder, RAG-Fusion using multiple query and HyDE approach Hybrid search with BM25 So... I'm a bit frustrated that i can not run all locally, became that is a must for my project. submitted by /u/Distinct-Target7503 [link] [comments]  ( 10 min )
    [R] 2x the context length of ALiBi through position interpolation
    https://arxiv.org/abs/2310.13017# Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths. We propose using linear position interpolation to extend the extrapolation range of models using Attention with Linear Biases (ALiBi). We find position interpolation significantly improves extrapolation capability on upstream language modelling and downstream summarization and retrieval tasks. submitted by /u/jwan584 [link] [comments]  ( 9 min )
    [D] How to make research publication more reproducible?
    As context, I'm personally working on a project to make ML/AI research publication more reproducible. We're backed by Balaji Srinivasan (https://twitter.com/balajis) at the level of funding and advice. It seems like, despite attempts like Jupyter Notebooks or sites like Papers with Code, most published research in ML still isn't setup to be easily reproducible. Even companies like Anthropic/OpenAI don't put much of an emphasis on reproducibility, even though it's in their interest to do so to earn public trust. Our current hypothesis is to conceptualize reproducible research as software testing. Specifically we're thinking of building tools that let you internally test the robustness of results, and externally publish them s.t. they're reproducible. You can think of it as continuous integration for reproducible research; e.g. BuildBot for Reproducible Research. One specific idea I have is to build a model evaluation/testing platform that lets you: Internally eval LLM models on open benchmarks (TruthfulQA, AGIEval, etc.) Test robustness of results under different assumptions Externally publish reproducible results I don't have a background in ML research. So I'm looking to get input from research engineers on what challenges/barriers currently exist with model testing and publishing reproducibly — so I thought I'd reach out in this community if anyone's open to that! Let me know if this post doesn't conform to the rules, or if this should go somewhere else. submitted by /u/manveerbasra [link] [comments]  ( 9 min )
    [P] Image Captioning Model
    Hello everyone, I am currently trying to find suitable image captioning and visual question answering models to implement in my project. After a quick google search I came across BLIP2 from hugging face however, its a very large model overall and both my pc and colab could never load its lightest pretrained version. Does anyone know any similar pretrained models for the specific tasks or any other way to load this kind of large model? (I tried loading it with 8bit precision which still failed) I have 16gb of RAM and the task requires image captioning and the ability to ask the model details about the specific image. Any help is greatly appreciated!! submitted by /u/Spitefulsalamander [link] [comments]  ( 9 min )
    [D] Episodic Training vs. Random Sub-Sampling in Few-Shot Learning
    I'm new to few-shot learning and I'm having trouble understanding why prototypical networks use a random sub-sampling approach while the vanilla few-shot learning approach uses episodic training. Doesn't random sub-sampling fail to guarantee that data overlapping won't occur? submitted by /u/The_Aoki_Taki [link] [comments]  ( 9 min )
    [D] High-temperature softmax
    I implemented a label propagation algorithm which is mainly used in the field of Video Object Segmentation (VOS). Basically I provide the labels for one frame and ask my model (using pre-trained encodings of frames) to do semantic segmentation on all the other frames of a video. I am obtaining consistently better results using an high temperature softmax when computing the similarity between pixels of different frames. Then the top-k similarities of each pixel (features) are used to propagate the labels from one frame to the next. I will not disclose the dataset I am using but let's say it is noisy (let's say also low quality). I want to understand why an high-temperature softmax performs better than a softmax with T=1 or an extreme T = 0.01. At the moment I get better results with T = 10, 100 and the trend in my grid search shows that even higher T could be possible. I was wondering if the model is still considerable valid if T is too high. I feel like the model is almost randomly guessing, if T is too high, but this apparently enhances performance. Every help is appreciated. Also literature about the topic! I only found one paper (which uses an high-temperature softmax to distill knowledge in a student-teacher network for remote sensing imagery) submitted by /u/darthjeio [link] [comments]  ( 9 min )
    [D] Callbacks in tensorflow v1
    Hi everyone, I have some old code written in tf1. It has not been ported to tf2 or pytorch yet. Does anyone of you have leads on whether one can implement custom callback for tf1 code and if there are any examples on the web? Thanks in advance. submitted by /u/wrik003 [link] [comments]  ( 9 min )
    [N] CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
    In the tech report of GPT4, an analysis was conducted on the impact of different languages on model performance. These effects are attributed to the amount of data and language characteristics. This also indicates that the model's effectiveness may not meet the expectations of users in different languages. The problem addressed in this paper is of significant importance. https://preview.redd.it/s48419fe9yvb1.jpg?width=2748&format=pjpg&auto=webp&s=ba76f1bd18043c6cb2610ed90f5c41a78b5ccd95 Arxiv: https://arxiv.org/abs/2310.13683v1 Stay updated with AI in a fun-to-listen way. Check out ai-dailynews.com to generate your personalized news podcast🎙. It's one of my open-source projects and takes no charge. submitted by /u/xuying_li [link] [comments]  ( 9 min )
    [N] Neural-Base Music Generation for Intelligence Duplication
    The paper employs a deep learning system to learn from the great composer Beethoven and capture his composition ability in a hash-based knowledge base. This new form of knowledge base provides a reasoning facility to drive the music composition through a novel music generation method. https://preview.redd.it/l9gzcoe38yvb1.png?width=1944&format=png&auto=webp&s=d6c5ca7f8fe434be1187c1f0440c5a94ebfc9b64 Arxiv: https://arxiv.org/abs/2310.13691v1 For more AI updates, check out this AI-generated news podcast🎙 tailored to your preferences(ai-dailynews.com), which is open source and free. submitted by /u/xuying_li [link] [comments]  ( 9 min )
    [D] Biclustering with the same row and column clusters
    The biclustering algorithm partitions rows and columns of a matrix into clusters so that the variance inside each intersection between row and column clusters in minimized. I want to perform the biclustering of a matrix, but additionally to enforce that the row and column clusters are the same, i.e. if the row i lies inside a row-cluster c then the column i must lie in a column-cluster c. Rows and columns in the matrix represent the same entities (but the matrix is non-simmetric). sklearn implementation does not support such a constraint. Are there any algorithms for this at all? submitted by /u/Tomarchelone [link] [comments]  ( 9 min )
    [D] Referenceless NLP Evaluation
    Hey all, I'm building this open source project that helps ML engineers evaluate LLM applications (its like unit testing for LLMs), and it works great in development since users can just write a test_file.py like how you would normally do it in pytest, but as I'm going onto the next phase I'm thinking how to bring evaluation to production, especially on metrics such as factual consistency where I need a ground truth. I'm hoping to get some ideas around this. Here's a link to the repo (https://github.com/confident-ai/deepeval) if you want more clarity on what the package looks like, but most importantly any help to brainstorm production evaluation will be greatly appreciated. Thank you very very much! submitted by /u/Ok_Constant_9886 [link] [comments]  ( 9 min )
    [D] Is Computer Vision dead? - “Quo Vadis, Computer Vision?”
    In ICCV23, several top notch researchers shared their insights (in a workshop called “Quo Vadis, Computer Vision?”) wrt the current state of Computer Vision, especially in light of the meteoric raise of LLMs. Has CV stalled? Is CV dead? E.g.MIT’s professor Bill Freeman, has some interesting points on foundation models: “FM aren’t fundamental, therefore not stable". Jitendra Malik argues "video can describe the world better than text." submitted by /u/btcmx [link] [comments]  ( 9 min )
    [R] Biologically plausible vision models for classification and grasping tasks
    Hey everyone! I am looking for papers that propose or explore biologically plausible vision models, primarily tasks like classification and grasping (predicting grasping bounding boxes) tasks. By biologically plausible, I mean papers that propose models inspired by the human brain in some way or the other. I know convolution is loosely inspired by human cognition, but everything I can find seems to suggest the opposite for ViT like models. I have come across certain papers like these: - https://arxiv.org/abs/1901.00945 - https://proceedings.neurips.cc/paper/2020/hash/98b17f068d5d9b7668e19fb8ae470841-Abstract.html But I am still looking for more. Any suggestions? submitted by /u/Far_Clothes_5054 [link] [comments]  ( 9 min )
    [D] Understanding the math behind diffusion models
    I was trying to comprehend the math behind this paper: https://arxiv.org/pdf/2006.11239.pdf. You can see in the equation corresponding to the forward diffusion process, at each time step, the image in the previous step is also scaled by sqrt(1-beta_t) while adding noise. It seems like the purpose of this is to maintain a fixed variance (or specifically, unit variance) at each time step. My question is: What is the significance of maintaining unit variance at each time step? Why is this useful? I saw somewhere that this is done to prevent the variance from "exploding." I don't really know what this means. I guess the variance keeps on increasing if the scaling isn't done. But why is this bad? submitted by /u/fallendeviL701b [link] [comments]  ( 9 min )
    [D] Neural Attention - One simple example that explains everything you need to know
    submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [D] Has anyone tried deploying FastAPI v2 with a BERT model on the NVIDIA Triton Inference Server?
    I'm not sure how to enable BERT with flash attention during the start-up of the Triton server in order to accelerate inference. Dao(the author of FA) told me he’s never tried. submitted by /u/g14loops [link] [comments]  ( 9 min )
  • Open

    Etsy Taking Stores Down as it's Bot Can't Tell Which Mockups are Real and Which ones are AI Generated
    If you are an Etsy seller or know someone who sells on Etsy, or maybe you went on Etsy and your favorite store is gone, could be due to the Etsy bots taking down stores for not figuring out properly which Mockup Images are real and which ones are AI Generated. All you have to do to find this out is go on youtube or social media and look for "etsy mockups news". Also Etsy has been pretty quiet about this and as a result Etsy sellers are going crazy about this as no one knows why some stores who haven't used AI to create their mockups are being targeted by these bots. This just goes to show how hard is getting to distinguish between what is real and what is AI generated and how across all industries companies are having issues adapting to AI technology changes. Thoughts? submitted by /u/fk1220 [link] [comments]
    New data poisoning tool lets artists fight back against generative AI
    Nightshade is a new data poisoning tool that allows artists to fight back against generative AI models. By adding invisible changes to the pixels in their art, artists can cause chaos and unpredictable results in AI models that use their work without permission. The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth. AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that th…
    I would like to upload 100+ one-hour-long podcasts in MP3 and get a 1-page summary of the most important points discussed in each episode — what's the best way to go about doing this?
    ChatGPT and Bard are cool, but I have to manually feed them transcripts generated by Whisper to get summaries. Furthermore, since the length of the transcript is often longer than the maximum character limit(s), I have to add additional prompts in between copying and pasting multipart transcripts. Since these recordings are 10–15 years old, the audio quality isn't the best, but I think it's sufficient to generate transcripts + detect speech, if not, I might need an additional "audio cleaning" step as well. I don't mind paying, and I'm above average in technical ability, so if anyone has any suggestions, I'd love to hear them. Here's what the workflow would look like: INPUT: I will upload a folder containing 100+ MP3 files of podcasts with below-average audio quality. OUTPUT: I would like to get a Google Doc or a Text file with 1-page summaries of the most important points in bullet-point format corresponding to each episode. Each page should be separated by some sort of divider, and the header should contain the filename for reference. Ideally, there should be an existing Jupyter Notebook I could throw in Google Colab and do all of the above in a plug-and-play manner, but if not, I'd love to hear your thoughts. Any tips? Thanks! submitted by /u/aknalid [link] [comments]
    The dilemma of potential AI consciousness isn't going away - in fact, it's right upon us. And we're nowhere near prepared. (MIT Tech Review)
    https://www.technologyreview.com/2023/10/16/1081149/ai-consciousness-conundrum/ "AI consciousness isn’t just a devilishly tricky intellectual puzzle; it’s a morally weighty problem with potentially dire consequences. Fail to identify a conscious AI, and you might unintentionally subjugate, or even torture, a being whose interests ought to matter. Mistake an unconscious AI for a conscious one, and you risk compromising human safety and happiness for the sake of an unthinking, unfeeling hunk of silicon and code. Both mistakes are easy to make." "Every expert has a preferred theory of consciousness, but none treats it as ideology—all of them are eternally alert to the possibility that they have backed the wrong horse." "The trouble with consciousness-­by-committee, though, is that this state of affairs won’t last. According to the authors of the white paper, there are no major technological hurdles in the way of building AI systems that score highly on their consciousness report card. Soon enough, we’ll be dealing with a question straight out of science fiction: What should one do with a potentially conscious machine?" "For his part, Schwitzgebel would rather we steer far clear of the gray zone entirely. But given the magnitude of the uncertainties involved, he admits that this hope is likely unrealistic—especially if conscious AI ends up being profitable. And once we’re in the gray zone—once we need to take seriously the interests of debatably conscious beings—we’ll be navigating even more difficult terrain, contending with moral problems of unprecedented complexity without a clear road map for how to solve them." submitted by /u/kamari2038 [link] [comments]
    The Future of AI Voice Technology
    submitted by /u/Amandacerni [link] [comments]
    UK officials use AI to decide on issues from benefits to marriage licences
    submitted by /u/sky_badger [link] [comments]
    One-Minute Daily AI News 10/22/2023
    A new AI agent Eureka developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can.[1] Meta’s Habitat 3.0 simulates real-world environments for intelligent AI robot training.[2] South Korea’s SK telecom Co. will collaborate with Deutsche Telekom AG to jointly develop a telecommunications-specific artificial intelligence (AI) large language model (LLM) as competition intensifies among local telecom companies to expand overseas with their own AI capabilities.[3] Scientists say they have built an artificial intelligence (AI) tool that can successfully identify and confirm supernovas.[4] Sources: [1] https://blogs.nvidia.com/blog/2023/10/20/eureka-robotics-research/ [2] https://siliconangle.com/2023/10/20/metas-habitat-3-0-simulates-real-world-environments-intelligent-ai-robot-training/ [3] https://pulsenews.co.kr/view.php?year=2023&no=810112 [4] https://learningenglish.voanews.com/a/researchers-build-first-tool-to-discover-supernovas/7318435.html submitted by /u/Excellent-Target-847 [link] [comments]
    How To Earn $1M+ By Using AI To Write Books
    I've been using ai for a long time, it often helps me to reduce my work time, but I want to try to earn money and decided to make an investigation. I want to hear your opinion on my analysis, and maybe this post will help someone in starting a business through ai Joe Popelas, a very young entrepreneur, has made over a million dollars within the last year selling AI-generated books online. I literally got fascinated by how simple yet powerful it is with these tools to create a book within a matter of a few hours. Joe Popelas is one of a new breed of AI entrepreneurs who capitalized on the democratization of large language models. Joe's story demonstrates the power of combining human creativity with AI. While AI tools did the heavy lifting for his initial drafts, Joe spent time refining …
  • Open

    From text to dream job: Building an NLP-based job recommender at Talent.com with Amazon SageMaker
    This post is co-authored by Anatoly Khomenko, Machine Learning Engineer, and Abdenour Bezzouh, Chief Technology Officer at Talent.com. Founded in 2011, Talent.com is one of the world’s largest sources of employment. The company combines paid job listings from their clients with public job listings into a single searchable platform. With over 30 million jobs listed […]  ( 12 min )
  • Open

    Street View to the Rescue: Deep Learning Paves the Way to Safer Buildings
    Images such as those in Google Street View are taking on a new purpose in the hands of University of Florida Assistant Professor of Artificial Intelligence Chaofeng Wang. He’s using them, along with deep learning, in a research project to automate the evaluation of urban buildings. The project aims to help governments mitigate natural disaster Read article >  ( 6 min )
  • Open

    How to properly evaluate competitive MARL?
    Hello, everyone! I'm building a MARL agent for a zero-sum game and I'm having a hard time evaluating it. I managed to quickly train it for a simple case and I could manually verify that it was actually learning the optimal decision making because I already know how the game works and, for this simple case, I know that there actually is a mathematically correct way to play it (from both sides) and how it should be played, but that isn't true for most cases (and even if it was, I wouldn't be able to manually verify thousands of games). To complicate things even more, there are billions and billions of possible initial states. For single-agent RL, I could set a reward threshold (if I knew which was the maximum reward possible) or at least I could set a maximum time of "no improvement" but, in a zero-sum game, the sum of the policy rewards is, well, zero. I could think of two solutions: Evaluate convergence to Nash Equilibrium on a subset of the possible initial states, which could be a problem because I'm not sure if the game dynamics guarantee the existance of Nash Equilibria; Evaluate convergence of the winrate of the trained agent against a "hand-crafted" baseline agent, which could be a problem because the quality of this evaluation method could depend on how well I can make this baseline agent (which won't be even close to optimal, otherwise I wouldn't be training an agent). Any thoughts? submitted by /u/victorsevero [link] [comments]
    Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation
    submitted by /u/gwern [link] [comments]
    In your opinion, which is the most beautiful form of the Bellman Equation and why?
    Didn't see anything about this kind of post in the rules I'm asking for a tattoo idea haha submitted by /u/victorsevero [link] [comments]
    Inverted pendulum swing-up problem not converging to global optimum using SAC or TD3.
    I am making a thesis about using RL to solve the inverted pendulum swing-up problem. I have tried using TD3, SAC, and TD3-Fork. In my testing, TD3-Fork worked best, I think SAC would also work if I am able to tune the hyperparameters correctly. I would like a similar trained agent to td3 converged where the agent balances the pole almost indefinitely. I have tried the hyperparameters from the website and also different hyperparameters but it has not converged. I am wondering if I am missing something or if there is anything I can do to improve the agent. I have been thinking of using HER instead of FORK. Any help or advice would be appreciated. training reward data The 'maximum' reward that I could get in the simulation is >880. The reward function that I used is -[cos(theta) + 10(|x| > 0.9) + 10(|theta_dt| > 18)]. However, from the data above it only converges to about 837 max and rarely reaches >900. trained td3 fork agent submitted by /u/YEEETTT0708 [link] [comments]
    Godot enables me to do pure C# Deep reinforcement learning.
    submitted by /u/Vae94 [link] [comments]
    [R] Demo of “Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization” (link to paper in the comments)
    submitted by /u/gwern [link] [comments]
  • Open

    Celebrating Kendall Square’s past and shaping its future
    The 15th Kendall Square Association annual meeting explored new and old aspects of the neighborhood.  ( 9 min )
  • Open

    Nonlinear algebra
    What is nonlinear algebra? Negations are tricky. They may be the largest source of bugs in database queries. You have to carefully think about what exactly are you negating. Any time you see “non-” attached to something, you have to ask what the context is in which the negation takes place. For example, if you […] Nonlinear algebra first appeared on John D. Cook.  ( 6 min )
  • Open

    Are Generalized Self-Supervised ViT Models the Image Objective Counterpart of LLM’s?
    submitted by /u/No-Platypus4021 [link] [comments]
    Neural Networks: A Deep Dive into AI's Building Blocks
    submitted by /u/Emily-joe [link] [comments]
  • Open

    Abstracts: October 23, 2023
    Today on “Abstracts,” Partner Research Manager Andy Gordon & Senior Researcher Carina Negreanu explore new work introducing co-audit, a term for any tool-assisted experience that helps users of generative AI find and fix mistakes in AI output. The post Abstracts: October 23, 2023 appeared first on Microsoft Research.  ( 16 min )

  • Open

    [P] Having GPT-4 Iterate on Unit Tests like a Human
    Hi r/MachineLearning, My name is William and I’m one of the founders of Sweep. Sweep is an AI junior developer that writes and fixes code by mirroring how a developer works. While building Sweep, we used to use the Github API, but we ran into rate limits, so we changed this to clone your repository for the duration of the request. It's now coming full circle. Sweep can now write, run, and debug a failing unit test for the ClonedRepo class! Blog: https://docs.sweep.dev/blogs/ai-unit-tests Video: https://www.youtube.com/watch?v=N9PUxmja9z4 submitted by /u/williamsweep [link] [comments]  ( 9 min )
    [D] Structured learning resources for ML Theory
    So essentially what the title says. I want to truly understand whats happening behind Machine Learning in general and also behind each algorithm specifically (starting from the basics to more advanced things, like Logistic Regression, Decisions trees and random forests, Deep Learning, NLP, GANS...). By structured I mean it contains all the pieces ordered and organized, from the same source, so you can can actually go from the building blocks up, not just a YouTune channel that uploads interesting videos about different machine learning related topics. Regarding the medium, I don't really mind but I would prefer audiovisual content (YT channel/playlists, Lectures, conferences...) but if you really recommend a specific book or series of books that's also okay. If it has some practical focus to it (to better grasp the theory) that would great. Also, I would prefer if it goes deep into the details, but not too deep into the specific maths involved, but if it's the case thats also okay. Regarding price, obviously if it's free that would be awesome, but in the range of free to 40€ is fine. Thank you for your recommendations in advance!! submitted by /u/aleradamantis [link] [comments]  ( 9 min )
    URL PHISHING OR BENIGN USING DEEP LEARNING "[Research]", "[R]", "[Project]", "[P]"
    Guys does anyone have an idea why my model does not work and it's like 50-50 chance to get it right. I'm getting really frustrated. Here is the code so far: ​ import pandas as pd import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import Dataset, DataLoader from collections import Counter from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score import re from imblearn.under_sampling import RandomUnderSampler # Loading the data file_path = "C:/Users/alex/Desktop/DATASET/malicious_phish.csv" data = pd.read_csv(file_path) # Filtering data filtered_data = data[data['type'].isin(['phishing', 'benign'])] # Undersampling the majority class rus = RandomUnderSampler(rand…  ( 12 min )
    [D] - Pre-Training a 4bit model (NOT Fine-tunning)
    Pre-Training using 4bit (NOT fine-tunning) Hello community! I have been messing around with open source LLM's running them locally using peft and AutoGPTQ in Transformers. I even trained a few QLora models (my favorite part) However my question is this, given the performance of a 4bit model why hasn't there been any research in this area? Is it possible to even create a new model using 4bit altogether? I am sure it's not as easy as it sounds but I haven't seen anyone try. Just curious cause it will open doors for many of us with consumer grade hardware. Thanks! submitted by /u/Delicious-Farmer-234 [link] [comments]  ( 9 min )
    [P] Infinity, a FOSS project for supporting RAG for LLMs and Vector Embeddings.
    https://github.com/michaelfeil/infinity Infinity, a open source REST API for serving vector embeddings, using a torch / ctranslate2 backend. Its under MIT License, fully tested and available under GitHub. I am the main author, curious to get your feedback. FYI: Huggingface launched a couple of days after me a similar project ("text-embeddings-inference"), under a non open-source and non-commercial license. submitted by /u/OrganicMesh [link] [comments]  ( 9 min )
    [R] Combining Thermodynamics and Diffusion Models for Collision-Free Robot Motion Planning
    Researchers from Yonsei University and UC Berkeley recently developed a new AI method for enabling autonomous robots to navigate unfamiliar environments filled with obstacles using only visual data as input. The key innovation is a customized diffusion model. Diffusion models can generate diverse motion plans by adding controlled noise. The researchers tailored the model to mimic how heat avoids insulation when dispersing through space. Similar to heat navigating around insulators, this "collision-avoiding" diffusion model learns to predict robot motions that avoid collisions with obstacles. It generates reachable goals and viable motion plans to those goals simultaneously. In simulations, this approach achieved ~98% success rates in navigating to target destinations while avoiding randomly generated obstacles using only visual map images as input. While extensive real-world testing is still needed (only 2D, only simulation), these initial results showcase promising capabilities: Enables navigation in unfamiliar environments without pre-mapping. Flexibly identifies and progresses toward reachable goals. Avoids unnecessary sensing systems for obstacle avoidance. Learns complex collision avoidance heuristics from visual data. I like the thermo + AI + robotics combination here - takes me back to my days in aerospace engineering. Pretty interesting approach. Full summary is here. Paper here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] Speeding up open source LLMs with speculative decoding
    submitted by /u/firef1y1 [link] [comments]  ( 9 min )
    [P] Open Source AI repos that caught my 👀 this week
    @MetaGPT_ github.com/geekan/MetaGPT - multi agent collaboration - MetaGPT encodes Standard Operating Procedures (SOPs) into prompts. The claim is that it takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc. @Ollama_ai github.com/jmorganca/olla… - run large language models locally. The future of AI/LLMs may not be on the cloud, but on your own laptops/mobiles. ollama.ai/blog/building-… @huggingface github.com/huggingface/ca… - slick ML framework for Rust with a focus on performance (including GPU support) @remilouf github.com/outlines-dev/o… - helps developers guide text generation to build robust interfaces with external systems. Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema. github.com/YiVal/YiVal enterprise AI platform submitted by /u/oana77oo [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] Teachers struggle to adapt amid AI revolution in education
    submitted by /u/DutchTechJunkie [link] [comments]  ( 9 min )
    [R] Language interaction to assist in composing and refining music
    Hey guys, I found an interesting paper recently. Universities in the UK introduced Loop Copilot, enabling users to generate and iteratively refine music through an interactive, multi-round dialogue interface. Using language interaction to assist in composing music is very appealing, AI makes a complex workflow easy and automated. https://preview.redd.it/nzvlgevcarvb1.jpg?width=998&format=pjpg&auto=webp&s=815e0f48c299831700215ebdb4257423e317f5ec submitted by /u/xuying_li [link] [comments]  ( 9 min )
    [D] How to account for extreme periods in time series forecasting?
    I am performing a (machine learning) time series forecast on monthly data from the last 20 years. If I separate my data into a train, validation, and test set, the validation set is almost completely filled with extreme values due to the Covid period. How to account for this? submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    [P] Graphing emotion events with LMs for in-depth sentiment analysis
    submitted by /u/helliun [link] [comments]  ( 8 min )
    [D] DINOv2 Breakdown: I've Created a Visual Guide to the Model's Design & a Concise Code Walkthrough
    submitted by /u/CkmCpvis [link] [comments]  ( 9 min )
    [R] Do you read ML/DL/AI related scientific papers? How do you filter them?
    As the title says. Recently, I found a review paper where the authors showed an exponential growth of published papers related to ML or DL. I was wondering if you even read those. If yes what's your way to find good and reliable papers? Do you choose only ones with a significant number of citations? Or just strictly related to your field? If no, why not? https://preview.redd.it/jwjvej5f5qvb1.jpg?width=1080&format=pjpg&auto=webp&s=bf3f7e08e0fe09fe0c6a6fd8d194945b45f5858e submitted by /u/hahahaczyk [link] [comments]  ( 9 min )
    [R] Open-Source Projects on Detecting Landmines
    I know that there are a lot of efforts at the moment to improve the algorithms used for landmine detection. Is anyone aware of any ongoing open-source projects in this space? submitted by /u/Eightstream [link] [comments]  ( 9 min )
    [R] Demo of “Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization” (link to paper in the comments)
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    German researchers create DeepMB for faster, high-quality optoacoustic imaging [N]
    Researchers from Germany have developed DeepMB, a groundbreaking deep-learning framework enabling high-quality and real-time optoacoustic imaging via multispectral optoacoustic tomography (MSOT). With potentially transformative implications for health care, this innovation might redefine medical imaging standards. To stay ahead of developments in AI, look here first. DeepMB breakthrough DeepMB resolves the longstanding tradeoff between image quality and speed in medical imaging. The deep-learning framework uses a deep neural network for model-based reconstruction, allowing for fast, high-quality imaging. DeepMB can reconstruct images approximately 1000 times faster than conventional techniques, with virtually no loss in image quality. Impressive metrics and implications The researchers accomplished accurate optoacoustic image reconstruction in just 31 milliseconds per image by training the system to pairingly synthesize optoacoustic signals with ground-truth images. DeepMB promises to equip clinicians with immediate access to high-quality MSOT images, regardless of the patient's condition or scanned body area. The technology could extend to other imaging modalities, such as ultrasound, x-ray, and MRI, potentially changing how diseases are diagnosed and treated. Exciting prospects The development of DeepMB is a significant leap in optoacoustic imaging, promising to enhance healthcare outcomes. As DeepMB evolves, it could become integral to modern medical imaging, delivering high-quality results at previously unattainable speeds. (source) P.S. If you like this kind of analysis, I write a free newsletter that unpacks the most significant news and research in AI. Google, Meta, and OpenAI professionals are already subscribed submitted by /u/orthomax23 [link] [comments]  ( 9 min )
    [D] ForeCastNet. Neural PDEs perform global weather simulation 4 to 5 orders of magnitude faster than traditional numerical methods.
    submitted by /u/moschles [link] [comments]  ( 9 min )
    Data labeling service for keypoints / pose [D]
    I was previously using scale.ai but they have been extraordinarily slow. Does anyone have recommendations for services to label keypoints or pose? Bonus points if the labeling service is able to handle 3D / multi angle data coming from multiple cameras. I work in an academic lab and scale is <10k images per batch. submitted by /u/researchrig [link] [comments]  ( 9 min )
    [D] Need help with text-to-song diffusion model architecture
    Hey, I want to make a text-to-song diff model, but I can't figure out the architecture I have already prepared a dataset of about 5000 songs of different genres, artists. It only contains the lyrics, the genre and the song itself Do I understand correctly that I should just encode the text and genre into one vector using CLIP and hope that the model will directly follow it (not skipping words and lines), or should I somehow make timestamps in the dataset (when, where and what text is sung)? I was inspired by Chirp V1 submitted by /u/Head-Selection-9785 [link] [comments]  ( 9 min )
  • Open

    IBM's NorthPole chip runs AI image recognition 22x faster than current chips
    IBM has developed a chip called NorthPole that runs AI-based image recognition 22 times faster than current chips on the market. The chip uses a two-dimensional array of memory blocks and interconnected CPUs to process data quickly. However, it can only run specialized AI processes and not training processes or large language models. The researchers plan to test connecting multiple NorthPole chips together to overcome this limitation. Source : https://techxplore.com/news/2023-10-ibm-northpole-chip-ai-based-image.html submitted by /u/NuseAI [link] [comments]
    Email Ai
    is there a website or some Ai to help me clean my inbox, stop receiving emails from certain senders etc etc... I've heard about: Sanebox for keeping your inbox organized Mailbutler for gathering contact details and tasks EmailTree for creating AI-powered workflows But they are paid and I'm looking for free alternatives submitted by /u/JOTA-137_0 [link] [comments]
    Microsoft CEO Satya Nadella talks AI, closing the Activision Blizzard deal, and his best business decision so far
    submitted by /u/thisisinsider [link] [comments]
    Medical Student Question: Why aren't there any programs that do differential diagnosis for doctor?
    Based on input you have. This would be like an enterprise software level program I guess and you would input history and then through trawling through data locally it can generate diseases and probability patient has each disease based on data inputted Why doesn't something like this already exist? I am learning how to do differential diagnosis now and it seems use extremely rudimentary understanding of probability to diagnose things. You use clusters of symptoms and then use tests to eliminate stuff in the differential. It just seems like low hanging fruit that a program could do using tech we already have (I imagine LLMs will make it easier) submitted by /u/derpgod123 [link] [comments]
    Tried visualizing an entire script using Dall-E 3 and these are the results.
    https://preview.redd.it/vi9wx005ksvb1.jpg?width=1024&format=pjpg&auto=webp&s=75502abcae7f2337693175101cb3491b8647d70d Revived an old script and made some images for it using Dall-E 3, just to test out the workflow: https://docs.google.com/document/d/1yyWRRmd0ah5Z4u8_aNYSq9csJ8pccP24Dcs9brPHbzs/edit Was pretty fun and I think by the end I got much better at learning how to maintain the consistency between characters, direct shots, etc. -~- submitted by /u/Kulimar [link] [comments]
    Combing Thermodynamics and Diffusion Models for Collision-Free Robot Motion Planning
    Researchers from Yonsei University and UC Berkeley recently developed a new AI method for enabling autonomous robots to navigate unfamiliar environments filled with obstacles using only visual data as input. The key innovation is a customized diffusion model. Diffusion models can generate diverse motion plans by adding controlled noise. The researchers tailored the model to mimic how heat avoids insulation when dispersing through space. Similar to heat navigating around insulators, this "collision-avoiding" diffusion model learns to predict robot motions that avoid collisions with obstacles. It generates reachable goals and viable motion plans to those goals simultaneously. In simulations, this approach achieved ~98% success rates in navigating to target destinations while avoiding randomly generated obstacles using only visual map images as input. While extensive real-world testing is still needed (only 2D, only simulation), these initial results showcase promising capabilities: Enables navigation in unfamiliar environments without pre-mapping. Flexibly identifies and progresses toward reachable goals. Avoids unnecessary sensing systems for obstacle avoidance. Learns complex collision avoidance heuristics from visual data. I like the thermo + AI + robotics combination here - takes me back to my days in aerospace engineering. Pretty interesting approach. Full summary is here. Paper here. submitted by /u/Successful-Western27 [link] [comments]
    I upgraded my AI girlfriend… and now she remembers stuff about me..
    submitted by /u/spaceecon [link] [comments]
    Self-learning AI Movement Prediction: Beyond Airstriker Genesis to multi-directional predictions
    Quick update on my self-learning software experiment: Thanks to your feedback, I decided to test my prediction system on a newer tower-defense game from the Apple App Store (simply called ‘The Tower’). What's crucial to remember is that this system is not pre-trained and only learns from the current game it encounters - it starts with zero knowledge and learns exclusively from the game it's currently playing, building from the ground up without the use of deep learning or neural networks. In this game (unlike Airstriker which I’ve previously used), players don't control a spaceship or fire weapons (you play the game by ‘upgrading’ your weapons, etc.). It's simpler because there's only one type of enemy that always approaches the center, so the system cannot demonstrate its capabilities for differentiation in this case. But this simplicity presents some other interesting challenges: Enemies approach from all 360-degree directions, pushing the boundaries of the path prediction software. They overlap during explosions, demanding the system to separate them. There's also more visual clutter, including static lines and a non-black background. The system's predictive performance has been remarkably strong. I’ve put together an overlay video to visually demonstrate how the system learns and adapts in this new game. Note: If things don’t align perfectly in there, it’s due to my poor video editing skills… Your feedback is appreciated as always! submitted by /u/_timmah_ [link] [comments]
    A scary thought...
    Without us, artificial intelligence just becomes intelligence submitted by /u/cognaceast [link] [comments]
    AI RPG DALL-E 3
    submitted by /u/the_anonymizer [link] [comments]
    Could machine learning produce a "simple" AI algorithm that performs better than what a human programmer could create in a reasonable amount of time?
    Let me clarify what I'm asking through an example: Artificial Intelligence in videogames has failed to develop in any meaningful way over the past two decades, at least as far as the typical end-user is concerned, and nowhere is this more apparent than in strategy games. Whether we're talking about the 90's or today, AI opponents typically have to receive significant cheats in order to provide a challenging experience for the player. This is widely considered undesirable, can harm immersion or a sense of fair-play, and leads to the concept of "cheesing" the AI (exploiting obvious weaknesses in the AI logic, something which is sometimes necessary if an AI receives such strong bonuses that any strategy you might attempt against another human player would be impossible to execute successfull…
  • Open

    Stability of a superegg
    Three weeks ago I wrote about supereggs, a shape popularized by Piet Hein. One aspect of supereggs that I did not address is their stability. Looking at the photo above, you could imagine that if you gave the object a slight nudge it would not fall over. Your intuition would be right: supereggs are stable. […] Stability of a superegg first appeared on John D. Cook.  ( 6 min )
    Best-of-five versus Best-of-seven
    Suppose that when Team X and Team Y play, the probability that X will win a single game is p and the probability that Y will win is q = 1 − p. What is the probability that X will win the majority of a series of N games for some odd number N? We know intuitively […] Best-of-five versus Best-of-seven first appeared on John D. Cook.  ( 5 min )
  • Open

    How the Self Play algorithm masters Multi-Agent AI
    submitted by /u/AvvYaa [link] [comments]
    Mujoco RL Robotic Arm
    Hi everyone, I'm new to robotic arms and I want to learn more about how to implement them using mujoco env. I'm looking for some open-source projects on github that I can run and understand. I tried MuJoCo_RL_UR5 repo but it didn't work well for me, it only deployed a random agent. Do you have any recommendations for good repos that are beginner-friendly and well-documented? submitted by /u/satyamstar [link] [comments]
    Why does bellman equation converge?
    After multiple iterations the value function converge by bellaman updates (vale iteration algorithm). Can someone provide a intuitive reasoning why the value converges? submitted by /u/RaceCondition01 [link] [comments]
  • Open

    Replay game input with image classification
    TensorFlow Keras correcting camera horizon in AC Valhalla https://www.youtube.com/watch?v=ASy-2zOMj_Y submitted by /u/Kostiantyn-Dvornik [link] [comments]
    How I determine neuron layers and amount of neurons in?
    Hello, I’m newbie in neural networks and I wonder, how do people decide how many hidden layers there will be and how many neurons will be inside? What the logic behind? submitted by /u/Particular-Song-633 [link] [comments]
    Unboxing Neuro Symbolic Reasoning and Learning
    submitted by /u/Neurosymbolic [link] [comments]

  • Open

    Google, other search engines' use of generative AI threatens $68B SEO industry
    The rise of generative AI in search engines like Google threatens the $68 billion search engine optimization (SEO) industry. Generative AI tools like ChatGPT aim to provide direct answers to user queries, bypassing the need for users to click on search results. This could render SEO efforts useless and impact the revenues of SEO consultants and search engines. However, generative AI search engines still face challenges such as providing incorrect or plagiarized answers, and gaining user trust and loyalty. Search engines have been quick to experiment with generative AI to improve search results, with Google's Bard, Microsoft's Bing AI, Baidu's ERNIE, and DuckDuckGo's DuckAssist being examples of this approach. As the quality of AI-generated answers improves, users will have less incentive to browse through search result listings, impacting the revenues of SEO consultants and search engines. The SEO industry generated $68.1 billion globally in 2022 and was expected to reach $129.6 billion by 2030, but the emergence of generative AI puts the industry at risk of obsolescence. Generative AI search engines are still in their infancy and face challenges such as providing incorrect or plagiarized answers, limiting their trust and loyalty among users. However, with the resources available to researchers, it is safe to assume that generative AI models will improve over time, leading to the potential death of the SEO industry. Source : https://theconversation.com/why-google-bing-and-other-search-engines-embrace-of-generative-ai-threatens-68-billion-seo-industry-210243 submitted by /u/NuseAI [link] [comments]
    Experimented with Fully Automating TikTok Video Creation Using AI for a Month - Here's What I Learned
    Hi everyone, I recently undertook a personal project where I tried to automate the entire process of creating TikTok videos using various AI tools. The goal was to see how advanced we've come in terms of AI's capabilities in content creation and to explore the nuances of automating a traditionally 'human' task. Here's a brief breakdown: Scripting: Leveraged ChatGPT for generating video scripts. Voiceovers: Used ElevenLabs for lifelike voice narration. Video Creation: Employed a combination of StableDiffusion Animate & Replicate. Editing: Automated the editing process to sync with the AI-generated voiceovers. After setting everything up, I ran the system for a month, generating 3 videos daily. The results were intriguing and a mix of expected and unexpected outcomes. Would love to hear thoughts, feedback, or similar experiences from the community. Are there other creative ways you've seen or used AI in content creation? submitted by /u/General_crypto [link] [comments]
    AI RPG (Dall-E 3)
    submitted by /u/the_anonymizer [link] [comments]
    Thanks to AI, the future of programming may involve YELLING IN ALL CAPS
    The future of programming may involve human-like communication techniques, including yelling in all caps. OpenAI's DALL-E 3 AI image generator integrated into ChatGPT revealed internal prompts shared between the image generator and the AI assistant. The prompts included commands written in all-caps for emphasis. This shows that programming and communicating with computers may become more human-like in the future. Previously, programs used specialized data formats and APIs to communicate, but now large language models allow for cross-program interaction in conventional English. OpenAI trained GPT-4, the AI model used in ChatGPT DALL-E interface, on hundreds of millions of documents scraped from the web, which included instances of polite language and reactions to it. The use of all-caps in the DALL-E message is interpreted as emphasis, and the model pays more attention to capitalized sentences. In the future, programming and communicating with computers may involve more emphasis and human-like communication techniques. Source : https://arstechnica.com/information-technology/2023/10/thanks-to-ai-the-future-of-programming-may-involve-yelling-in-all-caps/ submitted by /u/NuseAI [link] [comments]
    Close up view of rain hitting dust.
    submitted by /u/IllustriousVideo6145 [link] [comments]
    Impressive
    submitted by /u/the_anonymizer [link] [comments]
    Singularity Pinball.
    submitted by /u/Philipp [link] [comments]
    One-Minute Daily AI News 10/21/2023
    This dating app SciMatch uses AI to find your soulmate by your face. Snap a selfie, and let the app do the rest.[1] The Biden administration is reducing the types of semiconductors that American companies will be able to sell to China, citing the desire to close loopholes in existing regulations announced last year.[2] Business Schools Are Adding AI Education Into The Curriculum.[3] Google Pixel’s face-altering photo tool sparks AI manipulation debate.[4] Sources: [1] https://www.foxnews.com/tech/dating-app-uses-ai-find-soul-mate-face [2] https://www.cnn.com/2023/10/18/tech/us-china-chip-export-curbs-intl-hnk/index.html [3] https://www.entrepreneur.com/business-news/business-schools-are-adding-ai-education-for-future-ceos/464054 [4] https://www.bbc.com/news/technology-67170014 submitted by /u/Excellent-Target-847 [link] [comments]
    Training AI to Play Pokemon with Reinforcement Learning
    submitted by /u/ShooBum-T [link] [comments]
    ChatGPT and Bard cannot solve every problem for you.
    My last post in this thread got almost 90k views, honestly I'm very happy that I was able to be so helpful. ​ One guy asked me why I couldn't give more details about what tools I use and what tools help me?:/ I decided to make the top 24 tools and describe what they are responsible for in 2 words. In order not to violate the rules of r/artificial I decided not to leave direct links to tools, so as not to violate the rules, as some tools can be paid, I left only links to 2 resources where I took this information, but they are fortunately free. YouTube Summaries → http://eightify.app 3D Animations → http://moviebot.io AI Assistant → http://zipzap.ai Prompts → http://wnr.ai How-to-videos → http://teachomatic.net Custom AI chatbots ➝ http://chatling.ai Remove Background ➝ http://unscreen.com Forms ➝ http://feathery.io Presentations ➝ http://beautiful.ai Learning ➝ http://albus.org Blog ➝ http://jasper.ai Videos ➝ http://descript.com Image ➝ http://tryleap.ai Resume ➝ http://mosaicml.com Grammar Check ➝ http://trinka.ai Meeting ➝ http://krisp.ai Video ➝ http://decoherence.co App development ➝ http://brancher.ai Design ➝ http://modiphy.com Coding assistant ➝ http://bito.ai Twitter assistant ➝ http://tweethunter.io Personal assistant ➝ http://chat.openai.com LinkedIn assistant ➝ http://taplio.com YouTube assistant ➝ http://vidiq.com I hope this is as useful to you as the first post I'm just sharing my experiences and observations in the field of ai. LIST AND SITE https://preview.redd.it/zgkra3plpgvb1.jpg?width=1068&format=pjpg&auto=webp&s=779003d65dfa70c58d50ad690a0e436c735cdaeb submitted by /u/PerceptionPlayful469 [link] [comments]
    Oracle loops in Nvidia's AI stack for end-to-end model development
    Oracle has partnered with Nvidia to bring Nvidia's AI stack to its marketplace, giving Oracle customers access to top-of-the-line GPUs for training models and building generative applications. Eligible enterprises can purchase Nvidia's DGX Cloud AI supercomputing platform and AI Enterprise software directly from the marketplace and start training models for deployment on the Oracle Cloud Infrastructure. Nvidia DGX Cloud offers a serverless experience for multi-node training of custom generative AI models, supporting near-limitless scale of GPU resources. Nvidia AI Enterprise helps teams accelerate the deployment of models to production, with features such as the Nvidia NeMo framework, Rapids, TensorRT LLM open-source library, and Triton Inference server. Oracle has been focused on industry partnerships for its AI efforts and has announced generative AI capabilities in its products and solutions. Source : https://venturebeat.com/ai/oracle-loops-in-nvidias-ai-stack-for-end-to-end-model-development/ submitted by /u/NuseAI [link] [comments]
  • Open

    Policy Evaluation
    I know that given a policy, I can find the value function using iterative policy evaluation. Can I, given the value function, find the policy? submitted by /u/MomoSolar [link] [comments]
    Question on advantage (re-)computation for PPO
    Hi, I've been re-reading the "What matters in on-policy reinforcement learning" paper (https://arxiv.org/abs/2006.05990), and noticed that they suggest to recompute advantages at the beginning of each epoch (choice C5, see section 3.5 and appendix B.1). I was wondering: if someone here had already tried this and seen a significant improvement (which is what the paper suggests) ? if it did not also suppose to recompute the value targets at the beginning of each epoch, which could lead to some sort of moving target issue ? Best, submitted by /u/Scrimbibete [link] [comments]
    In RL, how can we reward an action taken 5 steps ago?
    Let us say we are building a model that will learn how to play a computer game like DOTA or league of legends. If model for example, buys weapon A, and use the item's ability on opponent B, it should learn what damage it gives to opponent given the items opponent B is wearing. But we would have done a lot of other actions in between before being able to use that weapon to reward the model on what it does / how much damage it made. How does do you do delayed reward for specific action made X number of steps ago? Thank you. submitted by /u/oniongarlic88 [link] [comments]
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains"
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains" In another installment in our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker of the "Blog der großen Fragen" and I had the great honor and pleasure of conducting an interview with the very well-known and renowned German medical doctor and neuroscientist Professor Dr. Petra Ritter. In this context, Ms. Ritter became a co-founder and leader of the co-design project "The Virtual Brain", which is a component of the European Open Science Cloud (EOSC) and is "a neuroinformatics platform for simulating whole brain networks using biologically realistic connectivity". She is leading the development of a virtual research environment as a collaborative research platform for sensitive health data and head of the "German National Neuroscience Research Infrastructure Initiative (NFDI-Neuroscince)" and involved in the development of the "Health Data Cloud EBRAINS". Petra Ritter has been Johanna Quandt Professor and Head of the Section for Brain Simulation at the Department of Neurology with Experimental Neurology at Charité - Universitätsmedizin Berlin since 2017. There, Professor Ritter and her team are involved in the "Simulation of Brains". More at: https://philosophies.de/index.php/2023/09/17/die-simulation-von-gehirnen/ ​ https://preview.redd.it/937m7mtyvivb1.jpg?width=1000&format=pjpg&auto=webp&s=22d1a7576f2ebbe7904f0187bd7c0234df7ddb8f submitted by /u/philosophiesde [link] [comments]
  • Open

    [D] PRML reading buddy
    Hey there mates, I am a 3rd year PhD student, trying to break into good quality research (tired of trying different permutations ans combinations of X and Ys, and hitting dead end when things don't work, or worse -- being unable to explain why things work :D). I have recently decided to read PRML cover to cover (slowly) and do some of the exercises as well. Goal is to finish in 6 months (2 chapters per month). Is there anyone on a similar journey, would love to tag along and discuss nuances? submitted by /u/Zealousideal_Yak9131 [link] [comments]  ( 9 min )
    [D] What do you all think of these pearls of wisdom on “Doing Great Research”?
    About the latest Jason Wei’s tweet. submitted by /u/mildlyphd [link] [comments]  ( 9 min )
    [D] Which is the best physics engine for reinforcement learning??
    What are some of the best physics engine that we should be using to implement physics for complex reinforcement related tasks(like humanoid motions) ?? I came across mujoco, physx , pybullet, issac etc but not sure which to go with. Isaac seems to be something very interesting but the minimun requirements as per the website is 32gb of RAM which is way to much for me (I use a 8gb one). mujoco is good but the docs are very confusing and hard to get through. what do you believe is the best choice to go with?? submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [D] Ensemble of Strong vs Weak Predictors
    This crossed my mind recently and after searching online I couldn't find a concrete answer: would an ensemble composed of strong predictors (let's say training on 1 model of that type had a high metric performance) perform better than an ensemble composed of weak predictors? Bonus: are there any resources that would support your position you can link below? submitted by /u/robml [link] [comments]  ( 9 min )
    [R] Decoupling Features and Classes with Self-Organizing Class Embeddings
    submitted by /u/4rtemi5 [link] [comments]  ( 9 min )
    [P] [D] Hierarchical agent learns all possible policies. Would this implementation work?
    Here's my implementation of an idea I had many years ago: a Sensorimotor Inference Engine. A machine that explores the states space of an environment, learning how to traverse the state space, learning how to manipulate the environment, which when given a goal can manipulate the environment in accordance to the goal. In other words, it's an agent which learns not one policy, but all possible policies. Doing so, I believe requires a hierarchy: layers of the same structure which learn broader and broader contexts of the environment. I have recently attempted to design an extremely simple, and modularized version of this agent: The Encoder-Predictor-Actor circuit. I need feedback, do you think it would work? if it might work, how might I train the Actor model? I think I know how to train the Encoder and Predictor models, but the Actor model will be harder to train, so if you have any ideas I'd love to hear from you! ps. sorry for the typos in the image text. a first-pass diagram of the 'simplest' implementation of a sensorimotor inference engine: the encoder-predictor-actor circuit submitted by /u/Stack3 [link] [comments]  ( 9 min )
    Career suggestions [D]
    Hi there, I need some suggestions from you experts. I am an aerospace engineer (both BSc and MSc), with a university minor in AI. It's pretty clear to me that I should have studied computer science given my passion for this world. In the last 4 years I worked as engineer in a major aerospace company, and I managed to get back on track with computer science and ML by working as a data scientist and doing ML projects applied to space, while also practicing with LLM agents. My dream is to enter the AGI world, maybe working as an "AI engineer", or working on creating true "autonomous" systems, leveraging multi-modal models maybe. What do you suggest I should focus on to reach this goal? Getting first some "credit" as an ML engineer though courses and certifications, open source projects, or maybe applying right now to some startups in the field? Thanks guys! submitted by /u/cappellino1 [link] [comments]  ( 9 min )
    A[r]xiv Dives - Generating Speech from Text with Fast Speech-2
    We’ve been diving deep into Arxiv Papers as a team on Fridays, hope it’s helpful and feel free to join live if you like the format! submitted by /u/FallMindless3563 [link] [comments]  ( 9 min )
    [R] Eureka: Human-Level Reward Design via Coding Large Language Models
    submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    Computer Vision Project Ideas [Project]
    I am taking the computer vision course at my university. We have to do a final project but I am unable to come up with concrete ideas. These are the options: • Select a paper from the computer vision literature, implement and test the approach described in that paper • Take publicly available code, apply it to an interesting novel dataset and explore various extensions and modifications. You may also want to compare two or more systems. Running existing code on the data provided by the authors is not sufficient. • Design and implement a solution to a problem that interests you. This may earn you extra credits. Can anyone please help with what to do? submitted by /u/kxenak [link] [comments]  ( 9 min )
    [D] Can you use a different dataset to run ablation experiments?
    I am on a computer vision algorithm and I will be benchmarking my method on the MS COCO dataset, like the other methods that have been proposed for the same problem. I want to know if I can use a smaller dataset (COCO minitrain) for my ablation experiments to demonstrate the efficacy of the different components used in my algorithm and to save time and cost, or will that be a red flag to journal reviewers? submitted by /u/notEVOLVED [link] [comments]  ( 9 min )
    Searching pinecone for relative date information [R]
    I am embedding with gpt and upserting large medical reports into pinecone and then would like to query for chronological result. For example, I upload a report that consists of 10 office visits. I would like to know the date and results of the first visit and then the last visit. when I embed a query containing: How did the patient describe their pain in the last office visit in the text? pinecone doesn't understand the context of 'last' since it is just doing cosine likeness. It pulls pain information but doesn't have a clue which comes first. Any help would be greatly appreciated. submitted by /u/Silent_Case_3058 [link] [comments]  ( 9 min )
    [P] Wizard101 Auto-Buyer Script/Bot - Using OCR, OpenCV Python with multiprocessor performance improvements
    submitted by /u/HistorianCrafty3514 [link] [comments]  ( 8 min )
    [P] [D] : RAG on multilevel tabular data
    Hi, Has anyone done RAG on a multi level tabular data? If yes then what problems have you faced and how did you solve those? My model gives better answers when I converted the data to a JSON and then embedded it. But I'm looking for a better approach. submitted by /u/Euphoric-Chart1428 [link] [comments]  ( 9 min )
    [D] Is Megabyte's padding the same as streamingLLM?
    I was wondering after reading the recent streamingLLM paper https://arxiv.org/pdf/2309.17453.pdf if the attention sink they use through pre-training and inference is analogous to the learnable padding used in the MEGABYTE architecture https://arxiv.org/pdf/2305.07185.pdf although used for a different purpose? So if I just used MEGABYTE with sliding window attention at inference would it be the same as streamingLLM? submitted by /u/Additional-Ad-7043 [link] [comments]  ( 9 min )
    [D] cloud computing vs personal for ML
    I need a new PC to run NN on. My training sets are about 50GB. Would I be best building my own, or using Google colab pro? Anyone know the specs equivalent to colab Pro? submitted by /u/ajplant [link] [comments]  ( 9 min )
    [D] What is the current SOTA of self-supervised knowledge graph models?
    I want to create a research proposal in this area. Ideally, I would like to work towards self-supervised models that take as input raw (not preprocessed) data of various modalities (text, image, video, audio, ...) and output a knowledge graph of all the data contained within. For example, I could feed it the Wikipedia article about dogs and it spits back all the information contained within, structured in the form of a graph. For people who work in the same general area can you point me to the SOTA models/efforts and research groups that work in this area? And can you also highlight the current challenges to be overcome, if you are deep enough to know? ​ submitted by /u/KlutzyBiz [link] [comments]
    [D] Encoder vs Decoder Transformer for Token Classification
    Hi. I am working on TokenClassification problem which requires significant language understanding in the base model and was wondering if:- Is there any research that has shown on multiple datasets that encoder-only pretraining tasks produce more optimal results when finetuned for Token Classification tasks compared to decoder-only with same parameter sized models. Since a lot of LLM research is focused on text generation, most model are trained on decoder-only pretraining tasks, so what is the largest encoder-only pretrained model that is trained on >1T tokens. If encoder-only models do indeed produce more optimal results for Token Classification is there any empirical rule w.r.t. to parameter size that we can expect decoder-only to outperform encode-only models. (Eg. say 3B decoder-only is equivalent to 1B encoder-only with similar pretraining and finetuning data) submitted by /u/RemoteSaint [link] [comments]  ( 9 min )
    [D] Need some practical advice on choosing from different CNN model architectures.
    Hi everyone. I would just like to discuss a few things. I've spent about 2 months studying CNNs on coursera from the Deep Learning Specialization. In this time period I learnt the fundamentals and mechanisms of how CNNs work. I also took lectures on a few research papers that studied a few classical CNN models like AlexNet, LeNet-5, VGG-16. And then a few research papers that studied advanced stuff like ResNets, Inception Network, MobileNet, EfficientNet etc. Following that I studied Detection Algorithms, with a primary focus on YOLO Algorithm. I also briefly studied Regional Proposals, Semantic Segmentation, R-CNN, Fast-RCNN, Faster R-CNN, U-Net. I also learnt Face Recognition and Verification Models like Siamese Network using Triplet Loss function and Binary Classification. And also cove…  ( 10 min )
    [D] [P] Web browsing UI-based AI agent: GPT-4V-Act
    Github: GPT-4V-Act (A demo video can be found on the Github) Hi there! I'd like to share with you a project I recently developed. My inspiration came from a recent post about Set-of-Mark visual grounding in GPT-4V. Fascinatingly, my tests showed that GPT-4V, equipped with this capability, could inspect a UI screenshot and provide the precise pixel coordinates needed for steering a mouse/keyboard to perform a specified task. Motivated by this, I built a proof-of-concept web browser embedded with a co-pilot that can "view" the browser and interact with it. Currently, the demo is basic, utilizing web-scraping to morph ChatGPT Plus into an unofficial GPT-4V API at the backend. It lacks some actions and an adblock, resulting in the agent potentially being overloaded by the extensive popups …  ( 10 min )
  • Open

    Grade School & Preteen AI & Data Literacy
    I recently wrote the book “AI & Data Literacy: Empowering Citizens of Data Science” to help non-data scientists – which is most of the world – understand the risks associated with how companies capture and use your personal data to influence your viewing and buying habits… and even your political and societal beliefs.  And while… Read More »Grade School & Preteen AI & Data Literacy The post Grade School & Preteen AI & Data Literacy appeared first on Data Science Central.  ( 22 min )
  • Open

    Is there any neural network or LLM like chatgpt,midjourney that can help us train and generate custom sounds
    ​ Generating a Wide Variety of Sounds I'm a non-technical person with very little knowledge to develop AI tools and intending to learn Python and based on that My question is as follows: ​ Are there tools or chatgpt like platforms that can help people like me to generate couple of sounds like dog barks, cat meows. I want either something that can generate a variety of sounds or I want to work towards making something that cane help me generate audios like dog barks, such as fierce, aggressive ones but not just limited to dog barks but also sound focused on nature, other animals, vehicles, machinery(e.g., honks, engine sounds ), and possibly human sounds (though that's not my primary focus for now). The amount of technical Assistance Needed I also came across a tool like Teachable Machine and was wondering if it could be a solution as it does offer tools for audio. I am also aware that I would need datasets for such a task but apart from that I am not too sure about the nitty gritty or should I say the intricacies involved as well as the knowledge needed as I do assume it is likely not very easy https://www.youtube.com/watch?v=L4GOmYPPqn8&t=1854s ​ [Teachable Machine](https://teachablemachine.withgoogle.com/) ​ Inspiration I was inspired by a project I found here: [https://x.com/TheAIAnonGuy/status/1684443155448360961?s=20] ​ ​ Can anyone provide insights, guidance, or recommendations on how to accomplish this? To be fair, I'm not really sure if this is an audio-related or neural/machine learning (ML)/deep learning related learning question. But I would like more insight if this is possible on an individual scale either with teachable, code or AI or a combination of all approaches and if there are any beginner friendly ways to achieve this Thank you all for your assistance! submitted by /u/Beginning_Finding_98 [link] [comments]
  • Open

    A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm. (arXiv:2310.12244v1 [cs.LG])
    Domain incremental learning aims to adapt to a sequence of domains with access to only a small subset of data (i.e., memory) from previous domains. Various methods have been proposed for this problem, but it is still unclear how they are related and when practitioners should choose one method over another. In response, we propose a unified framework, dubbed Unified Domain Incremental Learning (UDIL), for domain incremental learning with memory. Our UDIL **unifies** various existing methods, and our theoretical analysis shows that UDIL always achieves a tighter generalization error bound compared to these methods. The key insight is that different existing methods correspond to our bound with different **fixed** coefficients; based on insights from this unification, our UDIL allows **adaptive** coefficients during training, thereby always achieving the tightest bound. Empirical results show that our UDIL outperforms the state-of-the-art domain incremental learning methods on both synthetic and real-world datasets. Code will be available at https://github.com/Wang-ML-Lab/unified-continual-learning.  ( 2 min )
    Cooperative Minibatching in Graph Neural Networks. (arXiv:2310.12403v1 [cs.LG])
    Significant computational resources are required to train Graph Neural Networks (GNNs) at a large scale, and the process is highly data-intensive. One of the most effective ways to reduce resource requirements is minibatch training coupled with graph sampling. GNNs have the unique property that items in a minibatch have overlapping data. However, the commonly implemented Independent Minibatching approach assigns each Processing Element (PE) its own minibatch to process, leading to duplicated computations and input data access across PEs. This amplifies the Neighborhood Explosion Phenomenon (NEP), which is the main bottleneck limiting scaling. To reduce the effects of NEP in the multi-PE setting, we propose a new approach called Cooperative Minibatching. Our approach capitalizes on the fact that the size of the sampled subgraph is a concave function of the batch size, leading to significant reductions in the amount of work per seed vertex as batch sizes increase. Hence, it is favorable for processors equipped with a fast interconnect to work on a large minibatch together as a single larger processor, instead of working on separate smaller minibatches, even though global batch size is identical. We also show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches. Our experimental evaluations show up to 4x bandwidth savings for fetching vertex embeddings, by simply increasing this dependency without harming model convergence. Combining our proposed approaches, we achieve up to 64% speedup over Independent Minibatching on single-node multi-GPU systems.  ( 3 min )
    Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer. (arXiv:2310.12442v1 [cs.CL])
    Pretrained transformer models have demonstrated remarkable performance across various natural language processing tasks. These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence. However, the (full) attention mechanism incurs high computational cost - quadratic in the sequence length, which is not affordable in tasks with long sequences, e.g., inputs with 8k tokens. Although sparse attention can be used to improve computational efficiency, as suggested in existing work, it has limited modeling capacity and often fails to capture complicated dependencies in long sequences. To tackle this challenge, we propose MASFormer, an easy-to-implement transformer variant with Mixed Attention Spans. Specifically, MASFormer is equipped with full attention to capture long-range dependencies, but only at a small number of layers. For the remaining layers, MASformer only employs sparse attention to capture short-range dependencies. Our experiments on natural language modeling and generation tasks show that a decoder-only MASFormer model of 1.3B parameters can achieve competitive performance to vanilla transformers with full attention while significantly reducing computational cost (up to 75%). Additionally, we investigate the effectiveness of continual training with long sequence data and how sequence length impacts downstream generation performance, which may be of independent interest.  ( 2 min )
    How a student becomes a teacher: learning and forgetting through Spectral methods. (arXiv:2310.12612v1 [cs.LG])
    In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits.  ( 3 min )
    An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning. (arXiv:2310.12274v1 [cs.CV])
    Textural Inversion, a prompt learning method, learns a singular embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images. However, identifying and integrating multiple object-level concepts within one scene poses significant challenges even when embeddings for individual concepts are attainable. This is further confirmed by our empirical tests. To address this challenge, we introduce a framework for Multi-Concept Prompt Learning (MCPL), where multiple new "words" are simultaneously learned from a single sentence-image pair. To enhance the accuracy of word-concept correlation, we propose three regularisation techniques: Attention Masking (AttnMask) to concentrate learning on relevant areas; Prompts Contrastive Loss (PromptCL) to separate the embeddings of different concepts; and Bind adjective (Bind adj.) to associate new "words" with known words. We evaluate via image generation, editing, and attention visualisation with diverse images. Extensive quantitative comparisons demonstrate that our method can learn more semantically disentangled concepts with enhanced word-concept correlation. Additionally, we introduce a novel dataset and evaluation protocol tailored for this new task of learning object-level concepts.  ( 2 min )
    No-Regret Learning in Bilateral Trade via Global Budget Balance. (arXiv:2310.12370v1 [cs.GT])
    Bilateral trade revolves around the challenge of facilitating transactions between two strategic agents -- a seller and a buyer -- both of whom have a private valuations for the item. We study the online version of the problem, in which at each time step a new seller and buyer arrive. The learner's task is to set a price for each agent, without any knowledge about their valuations. The sequence of sellers and buyers is chosen by an oblivious adversary. In this setting, known negative results rule out the possibility of designing algorithms with sublinear regret when the learner has to guarantee budget balance for each iteration. In this paper, we introduce the notion of global budget balance, which requires the agent to be budget balance only over the entire time horizon. By requiring global budget balance, we provide the first no-regret algorithms for bilateral trade with adversarial inputs under various feedback models. First, we show that in the full-feedback model the learner can guarantee $\tilde{O}(\sqrt{T})$ regret against the best fixed prices in hindsight, which is order-wise optimal. Then, in the case of partial feedback models, we provide an algorithm guaranteeing a $\tilde{O}(T^{3/4})$ regret upper bound with one-bit feedback, which we complement with a nearly-matching lower bound. Finally, we investigate how these results vary when measuring regret using an alternative benchmark.  ( 2 min )
    Automated Repair of Declarative Software Specifications in the Era of Large Language Models. (arXiv:2310.12425v1 [cs.SE])
    The growing adoption of declarative software specification languages, coupled with their inherent difficulty in debugging, has underscored the need for effective and automated repair techniques applicable to such languages. Researchers have recently explored various methods to automatically repair declarative software specifications, such as template-based repair, feedback-driven iterative repair, and bounded exhaustive approaches. The latest developments in large language models provide new opportunities for the automatic repair of declarative specifications. In this study, we assess the effectiveness of utilizing OpenAI's ChatGPT to repair software specifications written in the Alloy declarative language. Unlike imperative languages, specifications in Alloy are not executed but rather translated into logical formulas and evaluated using backend constraint solvers to identify specification instances and counterexamples to assertions. Our evaluation focuses on ChatGPT's ability to improve the correctness and completeness of Alloy declarative specifications through automatic repairs. We analyze the results produced by ChatGPT and compare them with those of leading automatic Alloy repair methods. Our study revealed that while ChatGPT falls short in comparison to existing techniques, it was able to successfully repair bugs that no other technique could address. Our analysis also identified errors in ChatGPT's generated repairs, including improper operator usage, type errors, higher-order logic misuse, and relational arity mismatches. Additionally, we observed instances of hallucinations in ChatGPT-generated repairs and inconsistency in its results. Our study provides valuable insights for software practitioners, researchers, and tool builders considering ChatGPT for declarative specification repairs.  ( 3 min )
    Classification-Aided Robust Multiple Target Tracking Using Neural Enhanced Message Passing. (arXiv:2310.12407v1 [cs.LG])
    We address the challenge of tracking an unknown number of targets in strong clutter environments using measurements from a radar sensor. Leveraging the range-Doppler spectra information, we identify the measurement classes, which serve as additional information to enhance clutter rejection and data association, thus bolstering the robustness of target tracking. We first introduce a novel neural enhanced message passing approach, where the beliefs obtained by the unified message passing are fed into the neural network as additional information. The output beliefs are then utilized to refine the original beliefs. Then, we propose a classification-aided robust multiple target tracking algorithm, employing the neural enhanced message passing technique. This algorithm is comprised of three modules: a message-passing module, a neural network module, and a Dempster-Shafer module. The message-passing module is used to represent the statistical model by the factor graph and infers target kinematic states, visibility states, and data associations based on the spatial measurement information. The neural network module is employed to extract features from range-Doppler spectra and derive beliefs on whether a measurement is target-generated or clutter-generated. The Dempster-Shafer module is used to fuse the beliefs obtained from both the factor graph and the neural network. As a result, our proposed algorithm adopts a model-and-data-driven framework, effectively enhancing clutter suppression and data association, leading to significant improvements in multiple target tracking performance. We validate the effectiveness of our approach using both simulated and real data scenarios, demonstrating its capability to handle challenging tracking scenarios in practical radar applications.  ( 3 min )
    Safe RLHF: Safe Reinforcement Learning from Human Feedback. (arXiv:2310.12773v1 [cs.AI])
    With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical. However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training. To address this issue, we propose Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algorithm for human value alignment. Safe RLHF explicitly decouples human preferences regarding helpfulness and harmlessness, effectively avoiding the crowdworkers' confusion about the tension and allowing us to train separate reward and cost models. We formalize the safety concern of LLMs as an optimization task of maximizing the reward function while satisfying specified cost constraints. Leveraging the Lagrangian method to solve this constrained problem, Safe RLHF dynamically adjusts the balance between the two objectives during fine-tuning. Through a three-round fine-tuning using Safe RLHF, we demonstrate a superior ability to mitigate harmful responses while enhancing model performance compared to existing value-aligned algorithms. Experimentally, we fine-tuned the Alpaca-7B using Safe RLHF and aligned it with collected human preferences, significantly improving its helpfulness and harmlessness according to human evaluations.  ( 2 min )
    Fast Parameter Inference on Pulsar Timing Arrays with Normalizing Flows. (arXiv:2310.12209v1 [astro-ph.IM])
    Pulsar timing arrays (PTAs) perform Bayesian posterior inference with expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing residuals each, producing a posterior distribution for the stochastic gravitational wave background (SGWB) can take days to a week. The computational bottleneck arises because the likelihood evaluation required for MCMC is extremely costly when considering the dimensionality of the search space. Fortunately, generating simulated data is fast, so modern simulation-based inference techniques can be brought to bear on the problem. In this paper, we demonstrate how conditional normalizing flows trained on simulated data can be used for extremely fast and accurate estimation of the SGWB posteriors, reducing the sampling time from weeks to a matter of seconds.  ( 2 min )
    MuseGNN: Interpretable and Convergent Graph Neural Network Layers at Scale. (arXiv:2310.12457v1 [cs.LG])
    Among the many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. In this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit desirable inductive biases and interpretability. However, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. To tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. We also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size.  ( 2 min )
    Closed-Form Diffusion Models. (arXiv:2310.12395v1 [cs.LG])
    Score-based generative models (SGMs) sample from a target distribution by iteratively transforming noise using the score function of the perturbed target. For any finite training set, this score function can be evaluated in closed form, but the resulting SGM memorizes its training data and does not generate novel samples. In practice, one approximates the score by training a neural network via score-matching. The error in this approximation promotes generalization, but neural SGMs are costly to train and sample, and the effective regularization this error provides is not well-understood theoretically. In this work, we instead explicitly smooth the closed-form score to obtain an SGM that generates novel samples without training. We analyze our model and propose an efficient nearest-neighbor-based estimator of its score function. Using this estimator, our method achieves sampling times competitive with neural SGMs while running on consumer-grade CPUs.  ( 2 min )
    Exploring Graph Neural Networks for Indian Legal Judgment Prediction. (arXiv:2310.12800v1 [cs.LG])
    The burdensome impact of a skewed judges-to-cases ratio on the judicial system manifests in an overwhelming backlog of pending cases alongside an ongoing influx of new ones. To tackle this issue and expedite the judicial process, the proposition of an automated system capable of suggesting case outcomes based on factual evidence and precedent from past cases gains significance. This research paper centres on developing a graph neural network-based model to address the Legal Judgment Prediction (LJP) problem, recognizing the intrinsic graph structure of judicial cases and making it a binary node classification problem. We explored various embeddings as model features, while nodes such as time nodes and judicial acts were added and pruned to evaluate the model's performance. The study is done while considering the ethical dimension of fairness in these predictions, considering gender and name biases. A link prediction task is also conducted to assess the model's proficiency in anticipating connections between two specified nodes. By harnessing the capabilities of graph neural networks and incorporating fairness analyses, this research aims to contribute insights towards streamlining the adjudication process, enhancing judicial efficiency, and fostering a more equitable legal landscape, ultimately alleviating the strain imposed by mounting case backlogs. Our best-performing model with XLNet pre-trained embeddings as its features gives the macro F1 score of 75% for the LJP task. For link prediction, the same set of features is the best performing giving ROC of more than 80%
    Generative modeling, design and analysis of spider silk protein sequences for enhanced mechanical properties. (arXiv:2309.10170v1 [cond-mat.mtrl-sci] CROSS LISTED)
    Spider silks are remarkable materials characterized by superb mechanical properties such as strength, extensibility and lightweightedness. Yet, to date, limited models are available to fully explore sequence-property relationships for analysis and design. Here we propose a custom generative large-language model to enable design of novel spider silk protein sequences to meet complex combinations of target mechanical properties. The model, pretrained on a large set of protein sequences, is fine-tuned on ~1,000 major ampullate spidroin (MaSp) sequences for which associated fiber-level mechanical properties exist, to yield an end-to-end forward and inverse generative strategy. Performance is assessed through: (1), a novelty analysis and protein type classification for generated spidroin sequences through BLAST searches, (2) property evaluation and comparison with similar sequences, (3) comparison of molecular structures, as well as, and (4) a detailed sequence motif analyses. We generate silk sequences with property combinations that do not exist in nature, and develop a deep understanding the mechanistic roles of sequence patterns in achieving overarching key mechanical properties (elastic modulus, strength, toughness, failure strain). The model provides an efficient approach to expand the silkome dataset, facilitating further sequence-structure analyses of silks, and establishes a foundation for synthetic silk design and optimization.
    TabuLa: Harnessing Language Models for Tabular Data Synthesis. (arXiv:2310.12746v1 [cs.LG])
    Given the ubiquitous use of tabular data in industries and the growing concerns in data privacy and security, tabular data synthesis emerges as a critical research area. The recent state-of-the-art methods show that large language models (LLMs) can be adopted to generate realistic tabular data. As LLMs pre-process tabular data as full text, they have the advantage of avoiding the curse of dimensionality associated with one-hot encoding high-dimensional data. However, their long training time and limited re-usability on new tasks prevent them from replacing exiting tabular generative models. In this paper, we propose Tabula, a tabular data synthesizer based on the language model structure. Through Tabula, we demonstrate the inherent limitation of employing pre-trained language models designed for natural language processing (NLP) in the context of tabular data synthesis. Our investigation delves into the development of a dedicated foundational model tailored specifically for tabular data synthesis. Additionally, we propose a token sequence compression strategy to significantly reduce training time while preserving the quality of synthetic data. Extensive experiments on six datasets demonstrate that using a language model structure without loading the well-trained model weights yields a better starting model for tabular data synthesis. Moreover, the Tabula model, previously trained on other tabular data, serves as an excellent foundation model for new tabular data synthesis tasks. Additionally, the token sequence compression method substantially reduces the model's training time. Results show that Tabula averagely reduces 46.2% training time per epoch comparing to current LLMs-based state-of-the-art algorithm and consistently achieves even higher synthetic data utility.
    The Power of Populations in Decentralized Learning Dynamics. (arXiv:2306.08670v2 [cs.LG] UPDATED)
    We study a distributed multi-armed bandit setting among a population of $n$ memory-constrained nodes in the gossip model: at each round, every node locally adopts one of $m$ arms, observes a reward drawn from the arm's (adversarially chosen) distribution, and then communicates with a randomly sampled neighbor, exchanging information to determine its policy in the next round. We introduce and analyze several families of dynamics for this task that are decentralized: each node's decision is entirely local and depends only on its most recently obtained reward and that of the neighbor it sampled. We show a connection between the global evolution of these decentralized dynamics with a certain class of "zero-sum" multiplicative weights update algorithms, and we develop a general framework for analyzing the population-level regret of these natural protocols. Using this framework, we derive sublinear regret bounds under a wide range of parameter regimes (i.e., the size of the population and number of arms) for both the stationary reward setting (where the mean of each arm's distribution is fixed over time) and the adversarial reward setting (where means can vary over time). Further, we show that these protocols can approximately optimize convex functions over the simplex when the reward distributions are generated from a stochastic gradient oracle.
    Adaptive Pairwise Encodings for Link Prediction. (arXiv:2310.11009v2 [cs.LG] UPDATED)
    Link prediction is a common task on graph-structured data that has seen applications in a variety of domains. Classically, hand-crafted heuristics were used for this task. Heuristic measures are chosen such that they correlate well with the underlying factors related to link formation. In recent years, a new class of methods has emerged that combines the advantages of message-passing neural networks (MPNN) and heuristics methods. These methods perform predictions by using the output of an MPNN in conjunction with a "pairwise encoding" that captures the relationship between nodes in the candidate link. They have been shown to achieve strong performance on numerous datasets. However, current pairwise encodings often contain a strong inductive bias, using the same underlying factors to classify all links. This limits the ability of existing methods to learn how to properly classify a variety of different links that may form from different factors. To address this limitation, we propose a new method, LPFormer, which attempts to adaptively learn the pairwise encodings for each link. LPFormer models the link factors via an attention module that learns the pairwise encoding that exists between nodes by modeling multiple factors integral to link prediction. Extensive experiments demonstrate that LPFormer can achieve SOTA performance on numerous datasets while maintaining efficiency.
    Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE. (arXiv:2306.03659v2 [cs.AI] UPDATED)
    Knowledge graph embedding models (KGEMs) have gained considerable traction in recent years. These models learn a vector representation of knowledge graph entities and relations, a.k.a. knowledge graph embeddings (KGEs). Learning versatile KGEs is desirable as it makes them useful for a broad range of tasks. However, KGEMs are usually trained for a specific task, which makes their embeddings task-dependent. In parallel, the widespread assumption that KGEMs actually create a semantic representation of the underlying entities and relations (e.g., project similar entities closer than dissimilar ones) has been challenged. In this work, we design heuristics for generating protographs -- small, modified versions of a KG that leverage RDF/S information. The learnt protograph-based embeddings are meant to encapsulate the semantics of a KG, and can be leveraged in learning KGEs that, in turn, also better capture semantics. Extensive experiments on various evaluation benchmarks demonstrate the soundness of this approach, which we call Modular and Agnostic SCHema-based Integration of protograph Embeddings (MASCHInE). In particular, MASCHInE helps produce more versatile KGEs that yield substantially better performance for entity clustering and node classification tasks. For link prediction, using MASCHinE substantially increases the number of semantically valid predictions with equivalent rank-based performance.
    Machine Learning Based Compensation for Inconsistencies in Knitted Force Sensors. (arXiv:2306.12129v2 [eess.SY] UPDATED)
    Knitted sensors frequently suffer from inconsistencies due to innate effects such as offset, relaxation, and drift. These properties, in combination, make it challenging to reliably map from sensor data to physical actuation. In this paper, we demonstrate a method for counteracting this by applying processing using a minimal artificial neural network (ANN) in combination with straightforward pre-processing. We apply a number of exponential smoothing filters on a re-sampled sensor signal, to produce features that preserve different levels of historical sensor data and, in combination, represent an adequate state of previous sensor actuation. By training a three-layer ANN with a total of 8 neurons, we manage to significantly improve the mapping between sensor reading and actuation force. Our findings also show that our technique translates to sensors of reasonably different composition in terms of material and structure, and it can furthermore be applied to related physical features such as strain.
    SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training. (arXiv:2310.02227v2 [cs.LG] UPDATED)
    In an era where symbolic mathematical equations are indispensable for modeling complex natural phenomena, scientific inquiry often involves collecting observations and translating them into mathematical expressions. Recently, deep learning has emerged as a powerful tool for extracting insights from data. However, existing models typically specialize in either numeric or symbolic domains, and are usually trained in a supervised manner tailored to specific tasks. This approach neglects the substantial benefits that could arise from a task-agnostic unified understanding between symbolic equations and their numeric counterparts. To bridge the gap, we introduce SNIP, a Symbolic-Numeric Integrated Pre-training, which employs joint contrastive learning between symbolic and numeric domains, enhancing their mutual similarities in the pre-trained embeddings. By performing latent space analysis, we observe that SNIP provides cross-domain insights into the representations, revealing that symbolic supervision enhances the embeddings of numeric data and vice versa. We evaluate SNIP across diverse tasks, including symbolic-to-numeric mathematical property prediction and numeric-to-symbolic equation discovery, commonly known as symbolic regression. Results show that SNIP effectively transfers to various tasks, consistently outperforming fully supervised baselines and competing strongly with established task-specific methods, especially in few-shot learning scenarios where available data is limited.
    Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization. (arXiv:2310.12526v1 [cs.LG])
    Bayesian optimization (BO) is widely used for black-box optimization problems, and have been shown to perform well in various real-world tasks. However, most of the existing BO methods aim to learn the optimal solution, which may become infeasible when the parameter space is extremely large or the problem is time-sensitive. In these contexts, switching to a satisficing solution that requires less information can result in better performance. In this work, we focus on time-sensitive black-box optimization problems and propose satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) approaches, including synchronous and asynchronous versions. We shift the target from an optimal solution to a satisficing solution that is easier to learn. The rate-distortion theory is introduced to construct a loss function that balances the amount of information that needs to be learned with sub-optimality, and the Blahut-Arimoto algorithm is adopted to compute the target solution that reaches the minimum information rate under the distortion limit at each step. Both discounted and undiscounted Bayesian cumulative regret bounds are theoretically derived for the proposed STS-PBO approaches. The effectiveness of the proposed methods is demonstrated on a fast-charging design problem of Lithium-ion batteries. The results are accordant with theoretical analyses, and show that our STS-PBO methods outperform both sequential counterparts and parallel BO with traditional Thompson sampling in both synchronous and asynchronous settings.
    Constrained Reweighting of Distributions: an Optimal Transport Approach. (arXiv:2310.12447v1 [stat.ML])
    We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
    Transformers for scientific data: a pedagogical review for astronomers. (arXiv:2310.12069v2 [astro-ph.IM] UPDATED)
    The deep learning architecture associated with ChatGPT and related generative AI products is known as transformers. Initially applied to Natural Language Processing, transformers and the self-attention mechanism they exploit have gained widespread interest across the natural sciences. The goal of this pedagogical and informal review is to introduce transformers to scientists. The review includes the mathematics underlying the attention mechanism, a description of the original transformer architecture, and a section on applications to time series and imaging data in astronomy. We include a Frequently Asked Questions section for readers who are curious about generative AI or interested in getting started with transformers for their research problem.
    Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction. (arXiv:2310.11466v2 [cs.LG] UPDATED)
    Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternatives. However, we observed that current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy. While similar phenomena have been extensively studied in general fields (e.g., Computer Vision) as model robustness, their impact on protein property prediction remains unexplored. In this paper, we first investigate the reason behind the performance decrease when utilizing predicted structures, attributing it to the structure embedding bias from the perspective of structure representation learning. To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures. Extensive experiments have shown that our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures. The benchmark datasets and codes will be released to benefit the community.
    Efficient Dataset Distillation through Alignment with Smooth and High-Quality Expert Trajectories. (arXiv:2310.10541v1 [cs.CV] CROSS LISTED)
    Training a large and state-of-the-art machine learning model typically necessitates the use of large-scale datasets, which, in turn, makes the training and parameter-tuning process expensive and time-consuming. Some researchers opt to distil information from real-world datasets into tiny and compact synthetic datasets while maintaining their ability to train a well-performing model, hence proposing a data-efficient method known as Dataset Distillation (DD). Despite recent progress in this field, existing methods still underperform and cannot effectively replace large datasets. In this paper, unlike previous methods that focus solely on improving the efficacy of student distillation, we are the first to recognize the important interplay between expert and student. We argue the significant impact of expert smoothness when employing more potent expert trajectories in subsequent dataset distillation. Based on this, we introduce the integration of clipping loss and gradient penalty to regulate the rate of parameter changes in expert trajectories. Furthermore, in response to the sensitivity exhibited towards randomly initialized variables during distillation, we propose representative initialization for synthetic dataset and balanced inner-loop loss. Finally, we present two enhancement strategies, namely intermediate matching loss and weight perturbation, to mitigate the potential occurrence of cumulative errors. We conduct extensive experiments on datasets of different scales, sizes, and resolutions. The results demonstrate that the proposed method significantly outperforms prior methods.
    The Kernel Density Integral Transformation. (arXiv:2309.10194v2 [stat.ML] UPDATED)
    Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering protection from the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.
    Microscaling Data Formats for Deep Learning. (arXiv:2310.10537v3 [cs.LG] UPDATED)
    Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.
    Improving Generalization of Alignment with Human Preferences through Group Invariant Learning. (arXiv:2310.11971v2 [cs.LG] UPDATED)
    The success of AI assistants based on language models (LLMs) hinges crucially on Reinforcement Learning from Human Feedback (RLHF), which enables the generation of responses more aligned with human preferences. As universal AI assistants, there's a growing expectation for them to perform consistently across various domains. However, previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples. This focus on quick reward gains undermines both the stability in training and the model's ability to generalize to new, unseen data. In this work, we propose a novel approach that can learn a consistent policy via RL across various data groups or domains. Given the challenges associated with acquiring group annotations, our method automatically classifies data into different groups, deliberately maximizing performance variance. Then, we optimize the policy to perform well on challenging groups. Lastly, leveraging the established groups, our approach adaptively adjusts the exploration space, allocating more learning capacity to more challenging data and preventing the model from over-optimizing on simpler data. Experimental results indicate that our approach significantly enhances training stability and model generalization.
    When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting. (arXiv:2310.11569v2 [cs.LG] UPDATED)
    Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarchical relations on point predictions and samples of distribution which does not account for coherency of forecast distributions. Previous works also silently assume that datasets are always consistent with given hierarchical relations and do not adapt to real-world datasets that show deviation from this assumption. We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy. PROFHiT uses a flexible probabilistic Bayesian approach and introduces a novel Distributional Coherency regularization to learn from hierarchical relations for entire forecast distribution that enables robust and calibrated forecasts as well as adapt to datasets of varying hierarchical consistency. On evaluating PROFHiT over wide range of datasets, we observed 41-88% better performance in accuracy and significantly better calibration. Due to modeling the coherency over full distribution, we observed that PROFHiT can robustly provide reliable forecasts even if up to 10% of input time-series data is missing where other methods' performance severely degrade by over 70%.
    Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer. (arXiv:2310.07587v2 [cs.LG] UPDATED)
    Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.
    ACES: Generating Diverse Programming Puzzles with Autotelic Language Models and Semantic Descriptors. (arXiv:2310.10692v2 [cs.LG] UPDATED)
    Finding and selecting new and interesting problems to solve is at the heart of curiosity, science and innovation. We here study automated problem generation in the context of the open-ended space of python programming puzzles. Existing generative models often aim at modeling a reference distribution without any explicit diversity optimization. Other methods explicitly optimizing for diversity do so either in limited hand-coded representation spaces or in uninterpretable learned embedding spaces that may not align with human perceptions of interesting variations. With ACES (Autotelic Code Exploration via Semantic descriptors), we introduce a new autotelic generation method that leverages semantic descriptors produced by a large language model (LLM) to directly optimize for interesting diversity, as well as few-shot-based generation. Each puzzle is labeled along 10 dimensions, each capturing a programming skill required to solve it. ACES generates and pursues novel and feasible goals to explore that abstract semantic space, slowly discovering a diversity of solvable programming puzzles in any given run. Across a set of experiments, we show that ACES discovers a richer diversity of puzzles than existing diversity-maximizing algorithms as measured across a range of diversity metrics. We further study whether and in which conditions this diversity can translate into the successful training of puzzle solving models.
    URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates. (arXiv:2307.03810v2 [cs.LG] UPDATED)
    Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under https://github.com/mkirchhof/url .
    On the power of graph neural networks and the role of the activation function. (arXiv:2307.04661v2 [cs.LG] UPDATED)
    In this article we present new results about the expressivity of Graph Neural Networks (GNNs). We prove that for any GNN with piecewise polynomial activations, whose architecture size does not grow with the graph input sizes, there exists a pair of non-isomorphic rooted trees of depth two such that the GNN cannot distinguish their root vertex up to an arbitrary number of iterations. The proof relies on tools from the algebra of symmetric polynomials. In contrast, it was already known that unbounded GNNs (those whose size is allowed to change with the graph sizes) with piecewise polynomial activations can distinguish these vertices in only two iterations. Our results imply a strict separation between bounded and unbounded size GNNs, answering an open question formulated by [Grohe, 2021]. We next prove that if one allows activations that are not piecewise polynomial, then in two iterations a single neuron perceptron can distinguish the root vertices of any pair of nonisomorphic trees of depth two (our results hold for activations like the sigmoid, hyperbolic tan and others). This shows how the power of graph neural networks can change drastically if one changes the activation function of the neural networks. The proof of this result utilizes the Lindemann-Weierstrauss theorem from transcendental number theory.
    In-Context Pretraining: Language Modeling Beyond Document Boundaries. (arXiv:2310.10638v2 [cs.CL] UPDATED)
    Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%).
    HGCVAE: Integrating Generative and Contrastive Learning for Heterogeneous Graph Learning. (arXiv:2310.11102v3 [cs.LG] UPDATED)
    Generative self-supervised learning (SSL) has exhibited significant potential and garnered increasing interest in graph learning. In this study, we aim to explore the problem of generative SSL in the context of heterogeneous graph learning (HGL). The previous SSL approaches for heterogeneous graphs have primarily relied on contrastive learning, necessitating the design of complex views to capture heterogeneity. However, existing generative SSL methods have not fully leveraged the capabilities of generative models to address the challenges of HGL. In this paper, we present HGCVAE, a novel contrastive variational graph auto-encoder that liberates HGL from the burden of intricate heterogeneity capturing. Instead of focusing on complicated heterogeneity, HGCVAE harnesses the full potential of generative SSL. HGCVAE innovatively consolidates contrastive learning with generative SSL, introducing several key innovations. Firstly, we employ a progressive mechanism to generate high-quality hard negative samples for contrastive learning, utilizing the power of variational inference. Additionally, we present a dynamic mask strategy to ensure effective and stable learning. Moreover, we propose an enhanced scaled cosine error as the criterion for better attribute reconstruction. As an initial step in combining generative and contrastive SSL, HGCVAE achieves remarkable results compared to various state-of-the-art baselines, confirming its superiority.
    Deep Probabilistic Movement Primitives with a Bayesian Aggregator. (arXiv:2307.05141v2 [cs.RO] UPDATED)
    Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
    CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling. (arXiv:2309.05270v2 [cs.CL] UPDATED)
    The mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been effective on many NLP tasks. However, NLM for CM is an under-explored area. Though transformers are capable and powerful, they cannot always encode positional information since they are non-recurrent. Therefore, to enrich word information and incorporate positional information, positional encoding is defined. We hypothesize that Switching Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2 or L2 -> L1), pose a challenge for CM Language Models (LMs), and hence give special emphasis to SPs in the modeling process. We experiment with several positional encoding mechanisms and show that rotatory positional encodings along with switching point information yield the best results. We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. CONFLATOR tries to learn to emphasize switching points using smarter positional encoding, both at unigram and bigram levels. CONFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English (Hinglish): (i) sentiment analysis and (ii) machine translation.
    Evaluating Superhuman Models with Consistency Checks. (arXiv:2306.09983v3 [cs.LG] UPDATED)
    If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making. For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record.
    Provable Guarantees for Neural Networks via Gradient Feature Learning. (arXiv:2310.12408v1 [cs.LG])
    Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific. This work proposes a unified analysis framework for two-layer networks trained by gradient descent. The framework is centered around the principle of feature learning from gradients, and its effectiveness is demonstrated by applications in several prototypical problems, such as mixtures of Gaussians and parity functions. The framework also sheds light on interesting network learning phenomena such as feature learning beyond kernels and the lottery ticket hypothesis.
    Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching. (arXiv:2306.07960v2 [cs.LG] UPDATED)
    Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations. We arrive at this finding analytically, by establishing that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an orthogonal frame. Extensive experiments conducted across various datasets, architectures, and imbalance scenarios corroborate our finding. Importantly, our experiments reveal that the inclusion of the ReLU activation restores symmetry without compromising test accuracy. This constitutes the first geometry characterization of SCL under imbalances. Additionally, our analysis and experiments underscore the pivotal role of batch selection strategies in representation geometry. By proving necessary and sufficient conditions for mini-batch choices that ensure invariant symmetric representations, we introduce batch-binding as an efficient strategy that guarantees these conditions hold.
    Automatic Prompt Optimization with "Gradient Descent" and Beam Search. (arXiv:2305.03495v2 [cs.CL] UPDATED)
    Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO), which is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. The algorithm uses minibatches of data to form natural language "gradients" that criticize the current prompt. The gradients are then "propagated" into the prompt by editing the prompt in the opposite semantic direction of the gradient. These gradient descent steps are guided by a beam search and bandit selection procedure which significantly improves algorithmic efficiency. Preliminary results across three benchmark NLP tasks and the novel problem of LLM jailbreak detection suggest that Automatic Prompt Optimization can outperform prior prompt editing techniques and improve an initial prompt's performance by up to 31%, by using data to rewrite vague task descriptions into more precise annotation instructions.
    On the Design Fundamentals of Diffusion Models: A Survey. (arXiv:2306.04542v3 [cs.LG] UPDATED)
    Diffusion models are generative models, which gradually add and remove noise to learn the underlying distribution of training data for data generation. The components of diffusion models have gained significant attention with many design choices proposed. Existing reviews have primarily focused on higher-level solutions, thereby covering less on the design fundamentals of components. This study seeks to address this gap by providing a comprehensive and coherent review on component-wise design choices in diffusion models. Specifically, we organize this review according to their three key components, namely the forward process, the reverse process, and the sampling procedure. This allows us to provide a fine-grained perspective of diffusion models, benefiting future studies in the analysis of individual components, the applicability of design choices, and the implementation of diffusion models.
    Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning. (arXiv:2306.00477v4 [cs.CL] UPDATED)
    Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate activations for the gradient calculation, akin to fine-tuning. One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed. Nevertheless, modifying a PLM to its reversible variant is not straightforward, since the reversible model has a distinct architecture from the currently released PLMs. In this paper, we first investigate what is a key factor for the success of existing PEFT methods, and realize that it's essential to preserve the PLM's starting point when initializing a PEFT method. With this finding, we propose memory-efficient fine-tuning (MEFT) that inserts adapters into a PLM, preserving the PLM's starting point and making it reversible without additional pre-training. We evaluate MEFT on the GLUE benchmark and five question-answering tasks with various backbones, BERT, RoBERTa, BART and OPT. MEFT significantly reduces the activation memory up to 84% of full fine-tuning with a negligible amount of trainable parameters. Moreover, MEFT achieves the same score on GLUE and a comparable score on the question-answering tasks as full fine-tuning. A similar finding is also observed for the image classification task.
    ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations. (arXiv:2306.08141v2 [cs.AI] UPDATED)
    As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.
    Quasi Manhattan Wasserstein Distance. (arXiv:2310.12498v1 [cs.LG])
    The Quasi Manhattan Wasserstein Distance (QMWD) is a metric designed to quantify the dissimilarity between two matrices by combining elements of the Wasserstein Distance with specific transformations. It offers improved time and space complexity compared to the Manhattan Wasserstein Distance (MWD) while maintaining accuracy. QMWD is particularly advantageous for large datasets or situations with limited computational resources. This article provides a detailed explanation of QMWD, its computation, complexity analysis, and comparisons with WD and MWD.
    Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling. (arXiv:2310.12421v1 [cs.LG])
    This paper proposes the use of causal modeling to detect and mitigate algorithmic bias. We provide a brief description of causal modeling and a general overview of our approach. We then use the Adult dataset, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on gender bias and the problem of binary classification. We show that gender bias in the prediction model is statistically significant at the 0.05 level. We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation. Furthermore, we show that the overall classification accuracy is improved slightly. Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R. Hence, it enhances explainability and promotes trust.
    Online Resource Allocation in Episodic Markov Decision Processes. (arXiv:2305.10744v3 [cs.DS] UPDATED)
    This paper studies a long-term resource allocation problem over multiple periods where each period requires a multi-stage decision-making process. We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process with an unknown non-stationary transition function and stochastic non-stationary reward and resource consumption functions. We propose the observe-then-decide regime and improve the existing decide-then-observe regime, while the two settings differ in how the observations and feedback about the reward and resource consumption functions are given to the decision-maker. We develop an online dual mirror descent algorithm that achieves near-optimal regret bounds for both settings. For the observe-then-decide regime, we prove that the expected regret against the dynamic clairvoyant optimal policy is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ where $\rho\in(0,1)$ is the budget parameter, $H$ is the length of the horizon, $S$ and $A$ are the numbers of states and actions, and $T$ is the number of episodes. For the decide-then-observe regime, we show that the regret against the static optimal policy that has access to the mean reward and mean resource consumption functions is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ with high probability. We test the numerical efficiency of our method for a variant of the resource-constrained inventory management problem.
    Kepler: Robust Learning for Faster Parametric Query Optimization. (arXiv:2306.06798v2 [cs.DB] UPDATED)
    Most existing parametric query optimization (PQO) techniques rely on traditional query optimizer cost models, which are often inaccurate and result in suboptimal query performance. We propose Kepler, an end-to-end learning-based approach to PQO that demonstrates significant speedups in query latency over a traditional query optimizer. Central to our method is Row Count Evolution (RCE), a novel plan generation algorithm based on perturbations in the sub-plan cardinality space. While previous approaches require accurate cost models, we bypass this requirement by evaluating candidate plans via actual execution data and training an ML model to predict the fastest plan given parameter binding values. Our models leverage recent advances in neural network uncertainty in order to robustly predict faster plans while avoiding regressions in query performance. Experimentally, we show that Kepler achieves significant improvements in query runtime on multiple datasets on PostgreSQL.
    Connecting Multi-modal Contrastive Representations. (arXiv:2305.14381v2 [cs.LG] UPDATED)
    Multi-modal Contrastive Representation learning aims to encode different modalities into a semantically aligned shared space. This paradigm shows remarkable generalization ability on numerous downstream tasks across various modalities. However, the reliance on massive high-quality data pairs limits its further development on more modalities. This paper proposes a novel training-efficient method for learning MCR without paired data called Connecting Multi-modal Contrastive Representations (C-MCR). Specifically, given two existing MCRs pre-trained on (A, B) and (B, C) modality pairs, we project them to a new space and use the data from the overlapping modality B to aligning the two MCRs in the new space. Meanwhile, since the modality pairs (A, B) and (B, C) are already aligned within each MCR, the connection learned by overlapping modality can also be transferred to non-overlapping modality pair (A, C). To unleash the potential of C-MCR, we further introduce a semantic-enhanced inter- and intra-MCR connection method. We first enhance the semantic consistency and completion of embeddings across different modalities for more robust alignment. Then we utilize the inter-MCR alignment to establish the connection, and employ the intra-MCR alignment to better maintain the connection for inputs from non-overlapping modalities. To demonstrate the effectiveness of C-MCR, we connect CLIP and CLAP via texts to derive audio-visual representations, and integrate CLIP and ULIP via images for 3D-language representations. Remarkably, without using any paired data, C-MCR for audio-visual achieves state-of-the-art performance on audio-image retrieval, audio-visual source localization, and counterfactual audio-image recognition tasks. Furthermore, C-MCR for 3D-language also attains advanced zero-shot 3D point cloud classification accuracy on ModelNet40.
    Seeing double with a multifunctional reservoir computer. (arXiv:2305.05799v2 [math.DS] UPDATED)
    Multifunctional biological neural networks exploit multistability in order to perform multiple tasks without changing any network properties. Enabling artificial neural networks (ANNs) to obtain certain multistabilities in order to perform several tasks, where each task is related to a particular attractor in the network's state space, naturally has many benefits from a machine learning perspective. Given the association to multistability, in this paper we explore how the relationship between different attractors influences the ability of a reservoir computer (RC), which is a dynamical system in the form of an ANN, to achieve multifunctionality. We construct the `seeing double' problem to systematically study how a RC reconstructs a coexistence of attractors when there is an overlap between them. As the amount of overlap increases, we discover that for multifunctionality to occur, there is a critical dependence on a suitable choice of the spectral radius for the RC's internal network connections. A bifurcation analysis reveals how multifunctionality emerges and is destroyed as the RC enters a chaotic regime that can lead to chaotic itinerancy.
    The Adaptive $\tau$-Lasso: Robustness and Oracle Properties. (arXiv:2304.09310v2 [stat.ML] UPDATED)
    This paper introduces a new regularized version of the robust $\tau$-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive $\ell_1$-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive $\ell_1$-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors $p$, we show that the adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness via the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of $\tau$-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulation experiments. In the face of outliers and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators achieve the best performance or close-to-best performance in terms of prediction and variable selection accuracy compared to other competing regularized estimators for all scenarios considered in this study. Therefore, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators can be effectively employed for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points.
    PGA: Personalizing Grasping Agents with Single Human-Robot Interaction. (arXiv:2310.12547v1 [cs.RO])
    Language-Conditioned Robotic Grasping (LCRG) aims to develop robots that ground and grasp objects based on natural language instructions. While robots capable of recognizing personal objects like "my wallet" can interact more naturally with non-expert users, current LCRG systems primarily limit robots to understanding only generic expressions. To this end, we introduce a task scenario GraspMine with a novel dataset that aims to locate and grasp personal objects given personal indicators via learning from a single human-robot interaction. To address GraspMine, we propose Personalized Grasping Agent (PGA), that learns personal objects by propagating user-given information through a Reminiscence-a collection of raw images from the user's environment. Specifically, PGA acquires personal object information by a user presenting a personal object with its associated indicator, followed by PGA inspecting the object by rotating it. Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm. Harnessing the information acquired from the interactions and the pseudo-labeled objects in the Reminiscence, PGA adapts the object grounding model to grasp personal objects. Experiments on GraspMine show that PGA significantly outperforms baseline methods both in offline and online settings, signifying its effectiveness and personalization applicability on real-world scenarios. Finally, qualitative analysis shows the effectiveness of PGA through a detailed investigation of results in each phase.
    STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models. (arXiv:2310.12667v1 [stat.ML])
    We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.
    One-shot Empirical Privacy Estimation for Federated Learning. (arXiv:2302.03098v4 [cs.LG] UPDATED)
    Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight. However, existing privacy auditing techniques usually make strong assumptions on the adversary (e.g., knowledge of intermediate model iterates or the training data distribution), are tailored to specific tasks, model architectures, or DP algorithm, and/or require retraining the model many times (typically on the order of thousands). These shortcomings make deploying such techniques at scale difficult in practice, especially in federated settings where model training can take days or weeks. In this work, we present a novel ``one-shot'' approach that can systematically address these challenges, allowing efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters, and without requiring any a priori knowledge about the model architecture, task, or DP training algorithm. We show that our method provides provably correct estimates for the privacy loss under the Gaussian mechanism, and we demonstrate its performance on well-established FL benchmark datasets under several adversarial threat models.
    Topic-Level Bayesian Surprise and Serendipity for Recommender Systems. (arXiv:2308.06368v2 [cs.IR] UPDATED)
    A recommender system that optimizes its recommendations solely to fit a user's history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 449 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.
    Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins. (arXiv:2305.04934v2 [q-bio.BM] CROSS LISTED)
    We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling, based on an attention neural network that integrates transformer and graph convolutional architectures in a causal multi-headed graph mechanism, to realize a generative pretrained model. The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks. Further trained on inverse tasks, the model is rendered capable of designing proteins with these properties as target features. The model is formulated as a general framework, completely prompt-based, and can be adapted for a variety of downstream tasks. We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance, beyond what would be possible by training a model on each dataset alone. Case studies are presented to validate the method, yielding protein designs specifically focused on structural proteins, but also exploring the applicability in the design of soluble, antimicrobial biomaterials. While our model is trained to ultimately perform 8 distinct tasks, with available datasets it can be extended to solve additional problems. In a broader sense, this work illustrates a form of multiscale modeling that relates a set of ultimate building blocks (here, byte-level utf8 characters that define the nature of the physical system at hand) to complex output. This materiomic scheme captures complex emergent relationships between universal building block and resulting properties via a synergizing learning capacity to express a set of potentialities embedded in the knowledge used in training, via the interplay of universality and diversity.
    AdANNS: A Framework for Adaptive Semantic Search. (arXiv:2305.19435v2 [cs.LG] UPDATED)
    Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
    A path-norm toolkit for modern networks: consequences, promises and challenges. (arXiv:2310.01225v2 [stat.ML] UPDATED)
    This work introduces the first toolkit around path-norms that is fully able to encompass general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.
    Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks. (arXiv:2305.12467v3 [cs.LG] UPDATED)
    The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.
    An Introduction to Transformers. (arXiv:2304.10557v4 [cs.LG] UPDATED)
    The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and spatio-temporal modelling. There are many introductions to transformers, but most do not contain precise mathematical descriptions of the architecture and the intuitions behind the design choices are often also missing. Moreover, as research takes a winding path, the explanations for the components of the transformer can be idiosyncratic. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture. We will not discuss training as this is rather standard. We assume that the reader is familiar with fundamental topics in machine learning including multi-layer perceptrons, linear transformations, softmax functions and basic probability.
    Relational Self-Supervised Learning. (arXiv:2203.08717v2 [cs.CV] UPDATED)
    Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most methods mainly focus on the instance level information (\ie, the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduce a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as \textit{relation} metric, which is thus utilized to match the feature embeddings of different augmentations. To boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. The designed asymmetric predictor head and an InfoNCE warm-up strategy enhance the robustness to hyper-parameters and benefit the resulting performance. Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures, including various lightweight networks (\eg, EfficientNet and MobileNet).
    EDGI: Equivariant Diffusion for Planning with Embodied Agents. (arXiv:2303.12410v2 [cs.LG] UPDATED)
    Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group Z, and the object permutation group Sn. EDGI follows the Diffuser framework (Janner et al., 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new SE(3)xZxSn-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier guidance let us softly break the symmetry for specific tasks as needed. On object manipulation and navigation tasks, EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
    Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey. (arXiv:2310.10060v2 [cs.LG] UPDATED)
    Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible, user-oriented tools. In light of these challenges, this study embarks on an exhaustive dissection of DA methodologies within the TSC realm. Our initial approach involved an extensive literature review spanning a decade, revealing that contemporary surveys scarcely capture the breadth of advancements in DA for TSC, prompting us to meticulously analyze over 100 scholarly articles to distill more than 60 unique DA techniques. This rigorous analysis precipitated the formulation of a novel taxonomy, purpose-built for the intricacies of DA in TSC, categorizing techniques into five principal echelons: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. Our taxonomy promises to serve as a robust navigational aid for scholars, offering clarity and direction in method selection. Addressing the conspicuous absence of holistic evaluations for prevalent DA techniques, we executed an all-encompassing empirical assessment, wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR time-series datasets, employing ResNet and a multi-faceted evaluation paradigm encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the inconsistent efficacies of DA techniques, with...
    Can Brain Signals Reveal Inner Alignment with Human Languages?. (arXiv:2208.06348v4 [q-bio.NC] UPDATED)
    Brain Signals, such as Electroencephalography (EEG), and human languages have been widely explored independently for many downstream tasks, however, the connection between them has not been well explored. In this study, we explore the relationship and dependency between EEG and language. To study at the representation level, we introduced \textbf{MTAM}, a \textbf{M}ultimodal \textbf{T}ransformer \textbf{A}lignment \textbf{M}odel, to observe coordinated representations between the two modalities. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure features. On downstream applications, sentiment analysis and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 1.7% on K-EmoCon and 9.3% on Zuco datasets for sentiment analysis, and 7.4% on ZuCo for relation detection. In addition, we provide interpretations of the performance improvement: (1) feature distribution shows the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) alignment weights show the influence of different language semantics as well as EEG frequency features; (3) brain topographical maps provide an intuitive demonstration of the connectivity in the brain regions. Our code is available at \url{https://github.com/Jason-Qiu/EEG_Language_Alignment}.
    DCSI -- An improved measure of cluster separability based on separation and connectedness. (arXiv:2310.12806v1 [stat.ML])
    Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. A review of the existing literature shows that neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate the central aspects of separability for density-based clustering: between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.
    Language-Guided Traffic Simulation via Scene-Level Diffusion. (arXiv:2306.06344v2 [cs.RO] UPDATED)
    Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user's query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.
    Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. (arXiv:2306.15687v2 [eess.AS] UPDATED)
    Large-scale generative models such as GPT and DALL-E have revolutionized the research community. These models not only generate high fidelity outputs, but are also generalists which can solve tasks not explicitly taught. In contrast, speech generative models are still primitive in terms of scale and task generalization. In this paper, we present Voicebox, the most versatile text-guided generative model for speech at scale. Voicebox is a non-autoregressive flow-matching model trained to infill speech, given audio context and text, trained on over 50K hours of speech that are not filtered or enhanced. Similar to GPT, Voicebox can perform many different tasks through in-context learning, but is more flexible as it can also condition on future context. Voicebox can be used for mono or cross-lingual zero-shot text-to-speech synthesis, noise removal, content editing, style conversion, and diverse sample generation. In particular, Voicebox outperforms the state-of-the-art zero-shot TTS model VALL-E on both intelligibility (5.9% vs 1.9% word error rates) and audio similarity (0.580 vs 0.681) while being up to 20 times faster. Audio samples can be found in \url{https://voicebox.metademolab.com}.
    ROMO: Retrieval-enhanced Offline Model-based Optimization. (arXiv:2310.07560v2 [cs.LG] UPDATED)
    Data-driven black-box model-based optimization (MBO) problems arise in a great number of practical application scenarios, where the goal is to find a design over the whole space maximizing a black-box target function based on a static offline dataset. In this work, we consider a more general but challenging MBO setting, named constrained MBO (CoMBO), where only part of the design space can be optimized while the rest is constrained by the environment. A new challenge arising from CoMBO is that most observed designs that satisfy the constraints are mediocre in evaluation. Therefore, we focus on optimizing these mediocre designs in the offline dataset while maintaining the given constraints rather than further boosting the best observed design in the traditional MBO setting. We propose retrieval-enhanced offline model-based optimization (ROMO), a new derivable forward approach that retrieves the offline dataset and aggregates relevant samples to provide a trusted prediction, and use it for gradient-based optimization. ROMO is simple to implement and outperforms state-of-the-art approaches in the CoMBO setting. Empirically, we conduct experiments on a synthetic Hartmann (3D) function dataset, an industrial CIO dataset, and a suite of modified tasks in the Design-Bench benchmark. Results show that ROMO performs well in a wide range of constrained optimization tasks.
    INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold. (arXiv:2204.07439v3 [cs.CV] UPDATED)
    Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks, but they suffer from quality degradation due to the lack of freedom as activations and weights are constrained to the binary values. To compensate for the accuracy drop, we propose a novel BNN design called Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which controls the quantization threshold dynamically in an input-dependent or instance-aware manner. According to our observation, higher-order statistics can be a representative metric to estimate the characteristics of the input distribution. INSTA-BNN is designed to adjust the threshold dynamically considering various information, including higher-order statistics, but it is also optimized judiciously to realize minimal overhead on a real device. Our extensive study shows that INSTA-BNN outperforms the baseline by 3.0% and 2.8% on the ImageNet classification task with comparable computing cost, achieving 68.5% and 72.2% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively.
    Discretize Relaxed Solution of Spectral Clustering via a Non-Heuristic Algorithm. (arXiv:2310.12752v1 [cs.LG])
    Spectral clustering and its extensions usually consist of two steps: (1) constructing a graph and computing the relaxed solution; (2) discretizing relaxed solutions. Although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. Unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. In other words, the primary drawback is the neglect of the original objective when computing the discrete solution. Inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. Since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. We also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. Sufficient experiments significantly show the superiority of our method.
    Fairness in Streaming Submodular Maximization over a Matroid Constraint. (arXiv:2305.15118v2 [cs.LG] UPDATED)
    Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.
    PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques. (arXiv:2304.12410v2 [cs.CL] UPDATED)
    Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare them, in particular in terms of (i) the structure and functionality they add to the PLM, (ii) the different types and degrees of efficiency improvements achieved, (iii) performance at different downstream tasks, and (iv) how differences in structure and functionality relate to efficiency and task performance. To facilitate such comparisons, this paper presents a reference architecture which standardises aspects shared by different PEFT techniques, while isolating differences to specific locations and interactions with the standard components. Through this process of standardising and isolating differences, a modular view of PEFT techniques emerges, supporting not only direct comparison of different techniques and their efficiency and task performance, but also systematic exploration of reusability and composability of the different types of finetuned modules. We demonstrate how the reference architecture can be applied to understand properties and relative advantages of PEFT techniques, hence to inform selection of techniques for specific tasks, and design choices for new PEFT techniques.
    Hybrid Search for Efficient Planning with Completeness Guarantees. (arXiv:2310.12819v1 [cs.AI])
    Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.
    When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting. (arXiv:2206.07940v4 [cs.LG] UPDATED)
    Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarchical relations on point predictions and samples of distribution which does not account for coherency of forecast distributions. Previous works also silently assume that datasets are always consistent with given hierarchical relations and do not adapt to real-world datasets that show deviation from this assumption. We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy. PROFHiT uses a flexible probabilistic Bayesian approach and introduces a novel Distributional Coherency regularization to learn from hierarchical relations for entire forecast distribution that enables robust and calibrated forecasts as well as adapt to datasets of varying hierarchical consistency. On evaluating PROFHiT over wide range of datasets, we observed 41-88% better performance in accuracy and significantly better calibration. Due to modeling the coherency over full distribution, we observed that PROFHiT can robustly provide reliable forecasts even if up to 10% of input time-series data is missing where other methods' performance severely degrade by over 70%.
    Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning. (arXiv:2310.12774v1 [cs.CL])
    Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-a-service usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.
    Tracking electricity losses and their perceived causes using nighttime light and social media. (arXiv:2310.12346v1 [physics.soc-ph])
    Urban environments are intricate systems where the breakdown of critical infrastructure can impact both the economic and social well-being of communities. Electricity systems hold particular significance, as they are essential for other infrastructure, and disruptions can trigger widespread consequences. Typically, assessing electricity availability requires ground-level data, a challenge in conflict zones and regions with limited access. This study shows how satellite imagery, social media, and information extraction can monitor blackouts and their perceived causes. Night-time light data (in March 2019 for Caracas, Venezuela) is used to indicate blackout regions. Twitter data is used to determine sentiment and topic trends, while statistical analysis and topic modeling delved into public perceptions regarding blackout causes. The findings show an inverse relationship between nighttime light intensity. Tweets mentioning the Venezuelan President displayed heightened negativity and a greater prevalence of blame-related terms, suggesting a perception of government accountability for the outages.
    Causal-structure Driven Augmentations for Text OOD Generalization. (arXiv:2310.12803v1 [cs.LG])
    The reliance of text classifiers on spurious correlations can lead to poor generalization at deployment, raising concerns about their use in safety-critical domains such as healthcare. In this work, we propose to use counterfactual data augmentation, guided by knowledge of the causal structure of the data, to simulate interventions on spurious features and to learn more robust text classifiers. We show that this strategy is appropriate in prediction problems where the label is spuriously correlated with an attribute. Under the assumptions of such problems, we discuss the favorable sample complexity of counterfactual data augmentation, compared to importance re-weighting. Pragmatically, we match examples using auxiliary data, based on diff-in-diff methodology, and use a large language model (LLM) to represent a conditional probability of text. Through extensive experimentation on learning caregiver-invariant predictors of clinical diagnoses from medical narratives and on semi-synthetic data, we demonstrate that our method for simulating interventions improves out-of-distribution (OOD) accuracy compared to baseline invariant learning algorithms.
    Audio Editing with Non-Rigid Text Prompts. (arXiv:2310.12858v1 [cs.SD])
    In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.
    Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data. (arXiv:2301.12321v4 [cs.LG] UPDATED)
    Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and SST2. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains. We release codes at https://github.com/snu-mllab/Neural-Relation-Graph.
    Explanation-Based Training with Differentiable Insertion/Deletion Metric-Aware Regularizers. (arXiv:2310.12553v1 [cs.LG])
    The quality of explanations for the predictions of complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how correctly the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes differentiable predictors to improve both insertion and deletion scores of the explanations while keeping their predictive accuracy. Since the original insertion and deletion metrics are indifferentiable with respect to the explanations and directly unavailable for gradient-based optimization, we extend the metrics to be differentiable and use them to formalize insertion and deletion metric-based regularizers. The experimental results on image and tabular datasets show that the deep neural networks-based predictors fine-tuned using ID-ExpO enable popular post-hoc explainers to produce more faithful and easy-to-interpret explanations while keeping high predictive accuracy.
    Example-based Hypernetworks for Out-of-Distribution Generalization. (arXiv:2203.14276v3 [cs.CL] UPDATED)
    As Natural Language Processing (NLP) algorithms continually achieve new milestones, out-of-distribution generalization remains a significant challenge. This paper addresses the issue of multi-source adaptation for unfamiliar domains: We leverage labeled data from multiple source domains to generalize to unknown target domains at training. Our innovative framework employs example-based Hypernetwork adaptation: a T5 encoder-decoder initially generates a unique signature from an input example, embedding it within the source domains' semantic space. This signature is subsequently utilized by a Hypernetwork to generate the task classifier's weights. We evaluated our method across two tasks - sentiment classification and natural language inference - in 29 adaptation scenarios, where it outpaced established algorithms. In an advanced version, the signature also enriches the input example's representation. We also compare our finetuned architecture to few-shot GPT-3, demonstrating its effectiveness in essential use cases. To our knowledge, this marks the first application of Hypernetworks to the adaptation for unknown domains.
    KwaiYiiMath: Technical Report. (arXiv:2310.07488v2 [cs.CL] UPDATED)
    Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale Chinese primary school mathematics test set (named KMath), consisting of 188 examples to evaluate the correctness of the problem-solving process generated by the models. Empirical studies demonstrate that KwaiYiiMath can achieve state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with the similar size models, respectively.
    Convergence of policy gradient methods for finite-horizon stochastic linear-quadratic control problems. (arXiv:2211.00617v2 [math.OC] UPDATED)
    We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.
    IC3: Image Captioning by Committee Consensus. (arXiv:2302.01328v3 [cs.CV] UPDATED)
    If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions that are "informationally impoverished," and focus on only a subset of the possible details, while ignoring other potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: "Image Captioning by Committee Consensus" (IC3), designed to generate a single caption that captures high-level details from several annotator viewpoints. Humans rate captions produced by IC3 at least as helpful as baseline SOTA models more than two thirds of the time, and IC3 can improve the performance of SOTA automated recall systems by up to 84%, outperforming single human-generated reference captions, and indicating significant improvements over SOTA approaches for visual description. Code is available at https://davidmchan.github.io/caption-by-committee/
    An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws. (arXiv:2212.01365v2 [cs.LG] UPDATED)
    We study the compute-optimal trade-off between model and training data set sizes for large neural networks. Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla. While that work studies transformer-based large language models trained on the MassiveText corpus gopher, as a starting point for development of a mathematical theory, we focus on a simpler learning model and data generating process, each based on a neural network with a sigmoidal output unit and single hidden layer of ReLU activation units. We introduce general error upper bounds for a class of algorithms which incrementally update a statistic (for example gradient descent). For a particular learning model inspired by barron 1993, we establish an upper bound on the minimal information-theoretically achievable expected error as a function of model and data set sizes. We then derive allocations of computation that minimize this bound. We present empirical results which suggest that this approximation correctly identifies an asymptotic linear compute-optimal scaling. This approximation also generates new insights. Among other things, it suggests that, as the input dimension or latent space complexity grows, as might be the case for example if a longer history of tokens is taken as input to a language model, a larger fraction of the compute budget should be allocated to growing the learning model rather than training data.
    Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units. (arXiv:2212.09730v2 [cs.SD] UPDATED)
    We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice conversion (VC) methods focus primarily on timbre, and ignore people's unique speaking style (prosody). The proposed approach uses a pretrained, self-supervised model for encoding speech to discrete units, which makes it simple, effective, and fast to train. All conversion modules are only trained on reconstruction like tasks, thus suitable for any-to-many VC with no paired data. We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate that DISSC significantly outperforms the evaluated baselines. Code and samples are available at https://pages.cs.huji.ac.il/adiyoss-lab/dissc/.
    Named Entity Recognition for Monitoring Plant Health Threats in Tweets: a ChouBERT Approach. (arXiv:2310.12522v1 [cs.CL])
    An important application scenario of precision agriculture is detecting and measuring crop health threats using sensors and data analysis techniques. However, the textual data are still under-explored among the existing solutions due to the lack of labelled data and fine-grained semantic resources. Recent research suggests that the increasing connectivity of farmers and the emergence of online farming communities make social media like Twitter a participatory platform for detecting unfamiliar plant health events if we can extract essential information from unstructured textual data. ChouBERT is a French pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards. This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
    Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook. (arXiv:2210.13623v3 [cs.AI] UPDATED)
    In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics, and last but not least, the speech and natural language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving the training of deep neural networks with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its reward-driven adaptability, state representations, temporal structures and generalizability. In this survey, we present an overview of recent advancements of reinforcement learning and bandits, and discuss how they can be effectively employed to solve speech and natural language processing problems with models that are adaptive, interactive and scalable.
    Deep Discriminative to Kernel Density Networks for Calibrated Inference. (arXiv:2201.13001v6 [cs.LG] UPDATED)
    Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points. Many popular methods for in-distribution (ID) calibration, such as isotonic regression and Platt's sigmoidal regression, exhibit excellent ID calibration performance but often at the cost of classification accuracy. Moreover, these methods are not calibrated for the entire feature space, leading to overconfidence in the case of out-of-distribution (OOD) samples. In this paper, we leveraged the fact that deep models, including both random forests and deep-nets, learn internal representations which are unions of polytopes with affine activation functions to conceptualize them both as partitioning rules of the feature space. We replace the affine function in each polytope populated by the training data with a Gaussian kernel. We propose sufficient conditions for our proposed methods to be consistent estimators of the corresponding class conditional densities. Moreover, our experiments on both tabular and vision benchmarks show that the proposed approaches obtain well-calibrated posteriors while mostly preserving or improving the classification accuracy of the original algorithm for in-distribution region, and extrapolates beyond the training data to handle out-of-distribution inputs appropriately.
    Multi-label Node Classification On Graph-Structured Data. (arXiv:2304.10398v3 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected $9$ multi-label datasets. Finally, we perform a large-scale comparative study with $8$ methods and $9$ datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.
    A Quasi-Wasserstein Loss for Learning Graph Neural Networks. (arXiv:2310.11762v2 [cs.LG] UPDATED)
    When learning graph neural networks (GNNs) in node-level prediction tasks, most existing loss functions are applied for each node independently, even if node embeddings and their labels are non-i.i.d. because of their graph structures. To eliminate such inconsistency, in this study we propose a novel Quasi-Wasserstein (QW) loss with the help of the optimal transport defined on graphs, leading to new learning and prediction paradigms of GNNs. In particular, we design a "Quasi-Wasserstein" distance between the observed multi-dimensional node labels and their estimations, optimizing the label transport defined on graph edges. The estimations are parameterized by a GNN in which the optimal label transport may determine the graph edge weights optionally. By reformulating the strict constraint of the label transport to a Bregman divergence-based regularizer, we obtain the proposed Quasi-Wasserstein loss associated with two efficient solvers learning the GNN together with optimal label transport. When predicting node labels, our model combines the output of the GNN with the residual component provided by the optimal label transport, leading to a new transductive prediction paradigm. Experiments show that the proposed QW loss applies to various GNNs and helps to improve their performance in node-level classification and regression tasks.
    Blind quantum machine learning with quantum bipartite correlator. (arXiv:2310.12893v1 [quant-ph])
    Distributed quantum computing is a promising computational paradigm for performing computations that are beyond the reach of individual quantum devices. Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes. In this work, we introduce novel blind quantum machine learning protocols based on the quantum bipartite correlator algorithm. Our protocols have reduced communication overhead while preserving the privacy of data from untrusted parties. We introduce robust algorithm-specific privacy-preserving mechanisms with low computational overhead that do not require complex cryptographic techniques. We then validate the effectiveness of the proposed protocols through complexity and privacy analysis. Our findings pave the way for advancements in distributed quantum computing, opening up new possibilities for privacy-aware machine learning applications in the era of quantum technologies.
    Prompt Injection Attacks and Defenses in LLM-Integrated Applications. (arXiv:2310.12815v1 [cs.CR])
    Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world applications called LLM-Integrated Applications. Multiple recent works showed that LLM-Integrated Applications are vulnerable to prompt injection attacks, in which an attacker injects malicious instruction/data into the input of those applications such that they produce results as the attacker desires. However, existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a general framework to formalize prompt injection attacks. Existing attacks, which are discussed in research papers and blog posts, are special cases in our framework. Our framework enables us to design a new attack by combining existing attacks. Moreover, we also propose a framework to systematize defenses against prompt injection attacks. Using our frameworks, we conduct a systematic evaluation on prompt injection attacks and their defenses with 10 LLMs and 7 tasks. We hope our frameworks can inspire future research in this field. Our code is available at https://github.com/liu00222/Open-Prompt-Injection.
    Fine-Tuning Generative Models as an Inference Method for Robotic Tasks. (arXiv:2310.12862v1 [cs.LG])
    Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.
    Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. (arXiv:1811.11479v2 [cs.LG] UPDATED)
    On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.
    Model-agnostic variable importance for predictive uncertainty: an entropy-based approach. (arXiv:2310.12842v1 [stat.ML])
    In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches in understanding both the sources of uncertainty and their impact on model performance.
    Neural networks with linear threshold activations: structure and algorithms. (arXiv:2111.08117v4 [cs.LG] UPDATED)
    In this article we present new results on neural networks with linear threshold activation functions. We precisely characterize the class of functions that are representable by such neural networks and show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. This is a surprising result in the light of recent exact representability investigations for neural networks using other popular activation functions like rectified linear units (ReLU). We also give precise bounds on the sizes of the neural networks required to represent any function in the class. Finally, we design an algorithm to solve the empirical risk minimization (ERM) problem to global optimality for these neural networks with a fixed architecture. The algorithm's running time is polynomial in the size of the data sample, if the input dimension and the size of the network architecture are considered fixed constants. The algorithm is unique in the sense that it works for any architecture with any number of layers, whereas previous polynomial time globally optimal algorithms work only for very restricted classes of architectures. Using these insights, we propose a new class of neural networks that we call shortcut linear threshold networks. To the best of our knowledge, this way of designing neural networks has not been explored before in the literature. We show that these neural networks have several desirable theoretical properties.
    Generating collective counterfactual explanations in score-based classification via mathematical optimization. (arXiv:2310.12822v1 [stat.ML])
    Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modified so that the perturbed instance is classified in the desired class by the Machine Learning classification model. Most of the Counterfactual Analysis literature focuses on the single-instance single-counterfactual setting, in which the analysis is done for one single instance to provide one single explanation. Taking a stakeholder's perspective, in this paper we introduce the so-called collective counterfactual explanations. By means of novel Mathematical Optimization models, we provide a counterfactual explanation for each instance in a group of interest, so that the total cost of the perturbations is minimized under some linking constraints. Making the process of constructing counterfactuals collective instead of individual enables us to detect the features that are critical to the entire dataset to have the individuals classified in the desired class. Our methodology allows for some instances to be treated individually, performing the collective counterfactual analysis for a fraction of records of the group of interest. This way, outliers are identified and handled appropriately. Under some assumptions on the classifier and the space in which counterfactuals are sought, finding collective counterfactuals is reduced to solving a convex quadratic linearly constrained mixed integer optimization problem, which, for datasets of moderate size, can be solved to optimality using existing solvers. The performance of our approach is illustrated on real-world datasets, demonstrating its usefulness.
    Hierarchical Forecasting at Scale. (arXiv:2310.12809v1 [cs.LG])
    Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.
    A Theoretical Approach to Characterize the Accuracy-Fairness Trade-off Pareto Frontier. (arXiv:2310.12785v1 [cs.LG])
    While the accuracy-fairness trade-off has been frequently observed in the literature of fair machine learning, rigorous theoretical analyses have been scarce. To demystify this long-standing challenge, this work seeks to develop a theoretical framework by characterizing the shape of the accuracy-fairness trade-off Pareto frontier (FairFrontier), determined by a set of all optimal Pareto classifiers that no other classifiers can dominate. Specifically, we first demonstrate the existence of the trade-off in real-world scenarios and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. Experimental results on synthetic data suggest insightful findings of the proposed framework: (1) When sensitive attributes can be fully interpreted by non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the trade-off via a two-step streamlined approach. The proposed research enables an in-depth understanding of the accuracy-fairness trade-off, pushing current fair machine-learning research to a new frontier.
    Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair. (arXiv:2309.00608v2 [cs.SE] UPDATED)
    During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot fixes 66 and 50 bugs, respectively, surpassing the best-performing baseline by 14 and 16 bugs fixed. More importantly, Repilot is capable of producing more valid and correct patches than the base LLM when given the same generation budget.
    Bayesian tomography using polynomial chaos expansion and deep generative networks. (arXiv:2307.04228v4 [physics.geo-ph] UPDATED)
    Implementations of Markov chain Monte Carlo (MCMC) methods need to confront two fundamental challenges: accurate representation of prior information and efficient evaluation of likelihoods. Principal component analysis (PCA) and related techniques can in some cases facilitate the definition and sampling of the prior distribution, as well as the training of accurate surrogate models, using for instance, polynomial chaos expansion (PCE). However, complex geological priors with sharp contrasts necessitate more complex dimensionality-reduction techniques, such as, deep generative models (DGMs). By sampling a low-dimensional prior probability distribution defined in the low-dimensional latent space of such a model, it becomes possible to efficiently sample the physical domain at the price of a generator that is typically highly non-linear. Training a surrogate that is capable of capturing intricate non-linear relationships between latent parameters and outputs of forward modeling presents a notable challenge. Indeed, while PCE models provide high accuracy when the input-output relationship can be effectively approximated by relatively low-degree multivariate polynomials, this condition is typically not met when employing latent variables derived from DGMs. In this contribution, we present a strategy combining the excellent reconstruction performances of a variational autoencoder (VAE) with the accuracy of PCA-PCE surrogate modeling in the context of Bayesian ground penetrating radar (GPR) traveltime tomography. Within the MCMC process, the parametrization of the VAE is leveraged for prior exploration and sample proposals. Concurrently, surrogate modeling is conducted using PCE, which operates on either globally or locally defined principal components of the VAE samples under examination.
    Recurrent Neural Language Models as Probabilistic Finite-state Automata. (arXiv:2310.05161v2 [cs.CL] UPDATED)
    Studying language models (LMs) in terms of well-understood formalisms allows us to precisely characterize their abilities and limitations. Previous work has investigated the representational capacity of recurrent neural network (RNN) LMs in terms of their capacity to recognize unweighted formal languages. However, LMs do not describe unweighted formal languages -- rather, they define probability distributions over strings. In this work, we study what classes of such probability distributions RNN LMs can represent, which allows us to make more direct statements about their capabilities. We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata, and can thus model a strict subset of probability distributions expressible by finite-state models. Furthermore, we study the space complexity of representing finite-state LMs with RNNs. We show that, to represent an arbitrary deterministic finite-state LM with $N$ states over an alphabet $\Sigma$, an RNN requires $\Omega\left(N |\Sigma|\right)$ neurons. These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.
    Neurosymbolic Grounding for Compositional World Models. (arXiv:2310.12690v1 [cs.LG])
    We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CG), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.
    Rank-DETR for High Quality Object Detection. (arXiv:2310.08854v2 [cs.CV] UPDATED)
    Modern detection transformers (DETRs) use a set of object queries to predict a list of bounding boxes, sort them by their classification confidence scores, and select the top-ranked predictions as the final detection results for the given input image. A highly performant object detector requires accurate ranking for the bounding box predictions. For DETR-based detectors, the top-ranked bounding boxes suffer from less accurate localization quality due to the misalignment between classification scores and localization accuracy, thus impeding the construction of high-quality detectors. In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs, combinedly called Rank-DETR. Our key contributions include: (i) a rank-oriented architecture design that can prompt positive predictions and suppress the negative ones to ensure lower false positive rates, as well as (ii) a rank-oriented loss function and matching cost design that prioritizes predictions of more accurate localization accuracy during ranking to boost the AP under high IoU thresholds. We apply our method to improve the recent SOTA methods (e.g., H-DETR and DINO-DETR) and report strong COCO object detection results when using different backbones such as ResNet-$50$, Swin-T, and Swin-L, demonstrating the effectiveness of our approach. Code is available at \url{https://github.com/LeapLabTHU/Rank-DETR}.
    Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff. (arXiv:2310.12671v1 [cs.LG])
    Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.
    Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation. (arXiv:2307.09688v2 [cs.IR] UPDATED)
    Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website https://kddcup23.github.io/.
    zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning. (arXiv:2310.02554v3 [cs.AI] UPDATED)
    Federated Learning (FL) is a machine learning paradigm, which enables multiple and decentralized clients to collaboratively train a model under the orchestration of a central aggregator. Traditional FL solutions rely on the trust assumption of the centralized aggregator, which forms cohorts of clients in a fair and honest manner. However, a malicious aggregator, in reality, could abandon and replace the client's training models, or launch Sybil attacks to insert fake clients. Such malicious behaviors give the aggregator more power to control clients in the FL setting and determine the final training results. In this work, we introduce zkFL, which leverages zero-knowledge proofs (ZKPs) to tackle the issue of a malicious aggregator during the training model aggregation process. To guarantee the correct aggregation results, the aggregator needs to provide a proof per round. The proof can demonstrate to the clients that the aggregator executes the intended behavior faithfully. To further reduce the verification cost of clients, we employ a blockchain to handle the proof in a zero-knowledge way, where miners (i.e., the nodes validating and maintaining the blockchain data) can verify the proof without knowing the clients' local and aggregated models. The theoretical analysis and empirical results show that zkFL can achieve better security and privacy than traditional FL, without modifying the underlying FL network structure or heavily compromising the training speed.
    An effective theory of collective deep learning. (arXiv:2310.12802v1 [physics.soc-ph])
    Unraveling the emergence of collective learning in systems of coupled artificial neural networks is an endeavor with broader implications for physics, machine learning, neuroscience and society. Here we introduce a minimal model that condenses several recent decentralized algorithms by considering a competition between two terms: the local learning dynamics in the parameters of each neural network unit, and a diffusive coupling among units that tends to homogenize the parameters of the ensemble. We derive the coarse-grained behavior of our model via an effective theory for linear networks that we show is analogous to a deformed Ginzburg-Landau model with quenched disorder. This framework predicts (depth-dependent) disorder-order-disorder phase transitions in the parameters' solutions that reveal the onset of a collective learning phase, along with a depth-induced delay of the critical point and a robust shape of the microscopic learning path. We validate our theory in realistic ensembles of coupled nonlinear networks trained in the MNIST dataset under privacy constraints. Interestingly, experiments confirm that individual networks -- trained only with private data -- can fully generalize to unseen data classes when the collective learning phase emerges. Our work elucidates the physics of collective learning and contributes to the mechanistic interpretability of deep learning in decentralized settings.
    Provably Powerful Graph Neural Networks for Directed Multigraphs. (arXiv:2306.11586v2 [cs.LG] UPDATED)
    This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiveness of our proposed adaptations in practice, we conduct experiments on synthetic subgraph detection tasks, which demonstrate outstanding performance with almost perfect results. Moreover, we apply our proposed adaptations to two financial crime analysis tasks. We observe dramatic improvements in detecting money laundering transactions, improving the minority-class F1 score of a standard message-passing GNN by up to 30%, and closely matching or outperforming tree-based and GNN baselines. Similarly impressive results are observed on a real-world phishing detection dataset, boosting three standard GNNs' F1 scores by around 15% and outperforming all baselines.
    Neural Likelihood Approximation for Integer Valued Time Series Data. (arXiv:2310.12544v1 [stat.ML])
    Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is difficult due to intractability of the likelihood; current methods, based on simulations of the underlying model, can be so computationally expensive as to be prohibitive. In this paper we construct a neural likelihood approximation for integer valued time series data using causal convolutions, which allows us to evaluate the likelihood of the whole time series in parallel. We demonstrate our method by performing inference on a number of ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving significant computational speed ups in situations where current methods struggle.
    Inverse Renormalization Group of Disordered Systems. (arXiv:2310.12631v1 [cond-mat.stat-mech])
    We propose inverse renormalization group transformations to construct approximate configurations for lattice volumes that have not yet been accessed by supercomputers or large-scale simulations in the study of spin glasses. Specifically, starting from lattices of volume $V=8^{3}$ in the case of the three-dimensional Edwards-Anderson model we employ machine learning algorithms to construct rescaled lattices up to $V'=128^{3}$, which we utilize to extract two critical exponents. We conclude by discussing how to incorporate numerical exactness within inverse renormalization group approaches of disordered systems, thus opening up the opportunity to explore a sustainable and energy-efficient generation of exact configurations for increasing lattice volumes without the use of dedicated supercomputers.
    Compression of Recurrent Neural Networks using Matrix Factorization. (arXiv:2310.12688v1 [cs.LG])
    Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to set the rank before training, this approach is neither flexible nor optimal. In this work, we propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Used in combination with training adaptations, our method achieves high compression rates with no or little performance degradation. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.
    Networkwide Traffic State Forecasting Using Exogenous Information: A Multi-Dimensional Graph Attention-Based Approach. (arXiv:2310.12353v1 [cs.LG])
    Traffic state forecasting is crucial for traffic management and control strategies, as well as user- and system-level decision making in the transportation network. While traffic forecasting has been approached with a variety of techniques over the last couple of decades, most approaches simply rely on endogenous traffic variables for state prediction, despite the evidence that exogenous factors can significantly impact traffic conditions. This paper proposes a multi-dimensional spatio-temporal graph attention-based traffic prediction approach (M-STGAT), which predicts traffic based on past observations of speed, along with lane closure events, temperature, and visibility across the transportation network. The approach is based on a graph attention network architecture, which also learns based on the structure of the transportation network on which these variables are observed. Numerical experiments are performed using traffic speed and lane closure data from the California Department of Transportation (Caltrans) Performance Measurement System (PeMS). The corresponding weather data were downloaded from the National Oceanic and Atmospheric Administration (NOOA) Automated Surface Observing Systems (ASOS). For comparison, the numerical experiments implement three alternative models which do not allow for the multi-dimensional input. The M-STGAT is shown to outperform the three alternative models, when performing tests using our primary data set for prediction with a 30-, 45-, and 60-minute prediction horizon, in terms of three error measures: Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). However, the model's transferability can vary for different transfer data sets and this aspect may require further investigation.
    Time-Aware Representation Learning for Time-Sensitive Question Answering. (arXiv:2310.12585v1 [cs.CL])
    Time is one of the crucial factors in real-world question answering (QA) problems. However, language models have difficulty understanding the relationships between time specifiers, such as 'after' and 'before', and numbers, since existing QA datasets do not include sufficient time expressions. To address this issue, we propose a Time-Context aware Question Answering (TCQA) framework. We suggest a Time-Context dependent Span Extraction (TCSE) task, and build a time-context dependent data generation framework for model training. Moreover, we present a metric to evaluate the time awareness of the QA model using TCSE. The TCSE task consists of a question and four sentence candidates classified as correct or incorrect based on time and context. The model is trained to extract the answer span from the sentence that is both correct in time and context. The model trained with TCQA outperforms baseline models up to 8.5 of the F1-score in the TimeQA dataset. Our dataset and code are available at https://github.com/sonjbin/TCQA
    Voyager: An Open-Ended Embodied Agent with Large Language Models. (arXiv:2305.16291v2 [cs.AI] UPDATED)
    We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.
    An ML-assisted OTFS vs. OFDM adaptable modem. (arXiv:2309.01319v2 [eess.SP] UPDATED)
    The Orthogonal-Time-Frequency-Space (OTFS) signaling is known to be resilient to doubly-dispersive channels, which impacts high mobility scenarios. On the other hand, the Orthogonal-Frequency-Division-Multiplexing (OFDM) waveforms enjoy the benefits of the reuse of legacy architectures, simplicity of receiver design, and low-complexity detection. Several studies that compare the performance of OFDM and OTFS have indicated mixed outcomes due to the plethora of system parameters at play beyond high-mobility conditions. In this work, we exemplify this observation using simulations and propose a deep neural network (DNN)-based adaptation scheme to switch between using either an OTFS or OFDM signal processing chain at the transmitter and receiver for optimal mean-squared-error (MSE) performance. The DNN classifier is trained to switch between the two schemes by observing the channel condition, received SNR, and modulation format. We compare the performance of the OTFS, OFDM, and the proposed switched-waveform scheme. The simulations indicate superior performance with the proposed scheme with a well-trained DNN, thus improving the MSE performance of the communication significantly.
    Detection and Evaluation of bias-inducing Features in Machine learning. (arXiv:2310.12805v1 [cs.LG])
    The cause-to-effect analysis can help us decompose all the likely causes of a problem, such as an undesirable business situation or unintended harm to the individual(s). This implies that we can identify how the problems are inherited, rank the causes to help prioritize fixes, simplify a complex problem and visualize them. In the context of machine learning (ML), one can use cause-to-effect analysis to understand the reason for the biased behavior of the system. For example, we can examine the root causes of biases by checking each feature for a potential cause of bias in the model. To approach this, one can apply small changes to a given feature or a pair of features in the data, following some guidelines and observing how it impacts the decision made by the model (i.e., model prediction). Therefore, we can use cause-to-effect analysis to identify the potential bias-inducing features, even when these features are originally are unknown. This is important since most current methods require a pre-identification of sensitive features for bias assessment and can actually miss other relevant bias-inducing features, which is why systematic identification of such features is necessary. Moreover, it often occurs that to achieve an equitable outcome, one has to take into account sensitive features in the model decision. Therefore, it should be up to the domain experts to decide based on their knowledge of the context of a decision whether bias induced by specific features is acceptable or not. In this study, we propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts. We evaluated our technique using four well-known datasets to showcase how our contribution can help spearhead the standard procedure when developing, testing, maintaining, and deploying fair/equitable machine learning systems.
    Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic. (arXiv:2310.12660v1 [cs.LG])
    Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form $x\to ax \bmod p$, where $a$ is taken from ${\mathbb Z}_p$, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of $p$-periodic functions on ${\mathbb Z}$ and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base $p$ is large. This in turn prevents such a learning algorithm from being successful.
    Red Teaming Language Model Detectors with Language Models. (arXiv:2305.19713v2 [cs.CL] UPDATED)
    The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent works have proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems.
    Test-Time Distribution Normalization for Contrastively Learned Vision-language Models. (arXiv:2302.11084v2 [cs.LG] UPDATED)
    Advances in the field of vision-language contrastive learning have made it possible for many downstream applications to be carried out efficiently and accurately by simply taking the dot product between image and text representations. One of the most representative approaches proposed recently known as CLIP has garnered widespread adoption due to its effectiveness. CLIP is trained with an InfoNCE loss that takes into account both positive and negative samples to help learn a much more robust representation space. This paper reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time. Intuitively, since the model has been optimized based on the InfoNCE loss, test-time procedures should also be in alignment. The question lies in how one can retrieve any semblance of negative samples information during inference in a computationally efficient way. To this end, we propose Distribution Normalization (DN), where we approximate the mean representation of a batch of test samples and use such a mean to represent what would be analogous to negative samples in the InfoNCE loss. DN requires no retraining or fine-tuning and can be effortlessly applied during inference. Extensive experiments on a wide variety of downstream tasks exhibit a clear advantage of DN over the dot product on top of other existing test-time augmentation methods.
    Post-processing Private Synthetic Data for Improving Utility on Selected Measures. (arXiv:2305.15538v2 [cs.LG] UPDATED)
    Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end users may have specific requirements that the synthetic data must satisfy. Failure to meet these requirements could significantly reduce the utility of the data for downstream use. We introduce a post-processing technique that improves the utility of the synthetic data with respect to measures selected by the end user, while preserving strong privacy guarantees and dataset quality. Our technique involves resampling from the synthetic data to filter out samples that do not meet the selected utility measures, using an efficient stochastic first-order algorithm to find optimal resampling weights. Through comprehensive numerical experiments, we demonstrate that our approach consistently improves the utility of synthetic data across multiple benchmark datasets and state-of-the-art synthetic data generation algorithms.
    Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter. (arXiv:2303.14090v2 [astro-ph.CO] UPDATED)
    Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.
    Towards a Deep Learning-based Online Quality Prediction System for Welding Processes. (arXiv:2310.12632v1 [cs.LG])
    The digitization of manufacturing processes enables promising applications for machine learning-assisted quality assurance. A widely used manufacturing process that can strongly benefit from data-driven solutions is \ac{GMAW}. The welding process is characterized by complex cause-effect relationships between material properties, process conditions and weld quality. In non-laboratory environments with frequently changing process parameters, accurate determination of weld quality by destructive testing is economically unfeasible. Deep learning offers the potential to identify the relationships in available process data and predict the weld quality from process observations. In this paper, we present a concept for a deep learning based predictive quality system in \ac{GMAW}. At its core, the concept involves a pipeline consisting of four major phases: collection and management of multi-sensor data (e.g. current and voltage), real-time processing and feature engineering of the time series data by means of autoencoders, training and deployment of suitable recurrent deep learning models for quality predictions, and model evolutions under changing process conditions using continual learning. The concept provides the foundation for future research activities in which we will realize an online predictive quality system for running production.
    Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark. (arXiv:2310.12567v1 [cs.AI])
    Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns. Safe reinforcement learning (SafeRL) emerges as a solution to optimize policies while simultaneously adhering to multiple constraints, thereby addressing the challenge of integrating reinforcement learning in safety-critical scenarios. In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. Additionally, we offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms. This comprehensive library can serve as a validation tool for the research community. By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications. The website of this project can be accessed at https://sites.google.com/view/safety-gymnasium.
    Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study. (arXiv:2304.06762v2 [cs.CL] UPDATED)
    Large decoder-only language models (LMs) can be largely improved in terms of perplexity by retrieval (e.g., RETRO), but its impact on text generation quality and downstream task accuracy is unclear. Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages. We first provide the recipe to reproduce RETRO up to 9.5B parameters while retrieving a text corpus with 330B tokens. Based on that, we have the following novel findings: i) RETRO outperforms GPT on text generation with much less degeneration (i.e., repetition), moderately higher factual accuracy, and slightly lower toxicity with a nontoxic retrieval database. ii) On the LM Evaluation Harness benchmark, RETRO largely outperforms GPT on knowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore, we introduce a simple variant of the model, RETRO++, which largely improves open-domain QA results of original RETRO (e.g., EM score +8.6 on Natural Question) and significantly outperforms retrieval-augmented GPT in both fine-tuning and zero-shot evaluation settings. Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models. We release our implementation at: https://github.com/NVIDIA/Megatron-LM#retro.
    REVAMP: Automated Simulations of Adversarial Attacks on Arbitrary Objects in Realistic Scenes. (arXiv:2310.12243v1 [cs.LG])
    Deep Learning models, such as those used in an autonomous vehicle are vulnerable to adversarial attacks where an attacker could place an adversarial object in the environment, leading to mis-classification. Generating these adversarial objects in the digital space has been extensively studied, however successfully transferring these attacks from the digital realm to the physical realm has proven challenging when controlling for real-world environmental factors. In response to these limitations, we introduce REVAMP, an easy-to-use Python library that is the first-of-its-kind tool for creating attack scenarios with arbitrary objects and simulating realistic environmental factors, lighting, reflection, and refraction. REVAMP enables researchers and practitioners to swiftly explore various scenarios within the digital realm by offering a wide range of configurable options for designing experiments and using differentiable rendering to reproduce physically plausible adversarial objects. We will demonstrate and invite the audience to try REVAMP to produce an adversarial texture on a chosen object while having control over various scene parameters. The audience will choose a scene, an object to attack, the desired attack class, and the number of camera positions to use. Then, in real time, we show how this altered texture causes the chosen object to be mis-classified, showcasing the potential of REVAMP in real-world scenarios. REVAMP is open-source and available at https://github.com/poloclub/revamp.
    Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models. (arXiv:2304.12526v2 [cs.CV] UPDATED)
    Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve outstanding FID scores in line with state-of-the-art benchmarks: 1.77 on CelebA-64$\times$64, 1.93 on AFHQv2-Wild-64$\times$64, and 2.72 on ImageNet-256$\times$256. We share our code and pre-trained models at https://github.com/Zhendong-Wang/Patch-Diffusion.
    On the Optimization and Generalization of Multi-head Attention. (arXiv:2310.12680v1 [cs.LG])
    The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.
    A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs. (arXiv:2310.12248v1 [cs.LG])
    Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes. Unlike prior approaches, our algorithm learns from sampled trajectories of the system and does not require prior knowledge of the system's topology.
    Personalized human mobility prediction for HuMob challenge. (arXiv:2310.12900v1 [cs.LG])
    We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and time, activity time, days of the week, time of day, and frequency of visits to POI (Point of Interest). As additional features, we incorporated the movement of other individuals with similar behavior patterns through the employment of clustering. The machine learning model we adopted was the Support Vector Regression (SVR). We performed accuracy through offline assessment and carried out feature selection and parameter tuning. Although overall dataset provided consists of 100,000 users trajectory, our method use only 20,000 target users data, and do not need to use other 80,000 data. Despite the personalized model's traditional feature engineering approach, this model yields reasonably good accuracy with lower computational cost.
    Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning. (arXiv:2310.12609v1 [cs.RO])
    Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance.
    Towards Better Dynamic Graph Learning: New Architecture and Unified Library. (arXiv:2303.13047v3 [cs.LG] UPDATED)
    We propose DyGFormer, a new Transformer-based architecture for dynamic graph learning. DyGFormer is conceptually simple and only needs to learn from nodes' historical first-hop interactions by: (1) a neighbor co-occurrence encoding scheme that explores the correlations of the source node and destination node based on their historical sequences; (2) a patching technique that divides each sequence into multiple patches and feeds them to Transformer, allowing the model to effectively and efficiently benefit from longer histories. We also introduce DyGLib, a unified library with standard training pipelines, extensible coding interfaces, and comprehensive evaluating protocols to promote reproducible, scalable, and credible dynamic graph learning research. By performing exhaustive experiments on thirteen datasets for dynamic link prediction and dynamic node classification tasks, we find that DyGFormer achieves state-of-the-art performance on most of the datasets, demonstrating its effectiveness in capturing nodes' correlations and long-term temporal dependencies. Moreover, some results of baselines are inconsistent with previous reports, which may be caused by their diverse but less rigorous implementations, showing the importance of DyGLib. All the used resources are publicly available at https://github.com/yule-BUAA/DyGLib.
    Stochastic Average Gradient : A Simple Empirical Investigation. (arXiv:2310.12771v1 [cs.LG])
    Despite the recent growth of theoretical studies and empirical successes of neural networks, gradient backpropagation is still the most widely used algorithm for training such networks. On the one hand, we have deterministic or full gradient (FG) approaches that have a cost proportional to the amount of training data used but have a linear convergence rate, and on the other hand, stochastic gradient (SG) methods that have a cost independent of the size of the dataset, but have a less optimal convergence rate than the determinist approaches. To combine the cost of the stochastic approach with the convergence rate of the deterministic approach, a stochastic average gradient (SAG) has been proposed. SAG is a method for optimizing the sum of a finite number of smooth convex functions. Like SG methods, the SAG method's iteration cost is independent of the number of terms in the sum. In this work, we propose to compare SAG to some standard optimizers used in machine learning. SAG converges faster than other optimizers on simple toy problems and performs better than many other optimizers on simple machine learning problems. We also propose a combination of SAG with the momentum algorithm and Adam. These combinations allow empirically higher speed and obtain better performance than the other methods, especially when the landscape of the function to optimize presents obstacles or is ill-conditioned.
    Approximate information maximization for bandit games. (arXiv:2310.12563v1 [stat.ML])
    Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.
    Testing the Consistency of Performance Scores Reported for Binary Classification Problems. (arXiv:2310.12527v1 [cs.LG])
    Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and rank classification techniques based on performance metrics such as accuracy, sensitivity, and specificity. However, reported performance scores may not always serve as a reliable basis for research ranking. This can be attributed to undisclosed or unconventional practices related to cross-validation, typographical errors, and other factors. In a given experimental setup, with a specific number of positive and negative test items, most performance scores can assume specific, interrelated values. In this paper, we introduce numerical techniques to assess the consistency of reported performance scores and the assumed experimental setup. Importantly, the proposed approach does not rely on statistical inference but uses numerical methods to identify inconsistencies with certainty. Through three different applications related to medicine, we demonstrate how the proposed techniques can effectively detect inconsistencies, thereby safeguarding the integrity of research fields. To benefit the scientific community, we have made the consistency tests available in an open-source Python package.
    A Scalable Test Problem Generator for Sequential Transfer Optimization. (arXiv:2304.08503v4 [cs.NE] UPDATED)
    Sequential transfer optimization (STO), which aims to improve the optimization performance on a task of interest by exploiting the knowledge captured from several previously-solved optimization tasks stored in a database, has been gaining increasing research attention over the years. However, despite the remarkable advances in algorithm design, the development of a systematic benchmark suite for comprehensive comparisons of STO algorithms received far less attention. Existing test problems are either simply generated by assembling other benchmark functions or extended from specific practical problems with limited scalability. The relationships between the optimal solutions of the source and target tasks in these problems are also often manually configured, limiting their ability to model different similarity relationships presented in real-world problems. Consequently, the good performance achieved by an algorithm on these problems might be biased and hard to be generalized to other problems. In light of the above, in this study, we first introduce four concepts for characterizing STO problems and present an important problem feature, namely similarity distribution, which quantitatively delineates the relationship between the optima of the source and target tasks. Then, we present the general design guidelines of STO problems and a particular STO problem generator with good scalability. Specifically, the similarity distribution of a problem can be easily customized, enabling a continuous spectrum of representation of the diverse similarity relationships of real-world problems. Lastly, a benchmark suite with 12 STO problems featured by a variety of customized similarity relationships is developed using the proposed generator. The source code of the problem generator is available at https://github.com/XmingHsueh/STOP-G.
    SemantIC: Semantic Interference Cancellation Towards 6G Wireless Communications. (arXiv:2310.12768v1 [eess.SP])
    This letter proposes a novel anti-interference technique, semantic interference cancellation (SemantIC), for enhancing information quality towards the sixth-generation (6G) wireless networks. SemantIC only requires the receiver to concatenate the channel decoder with a semantic auto-encoder. This constructs a turbo loop which iteratively and alternately eliminates noise in the signal domain and the semantic domain. From the viewpoint of network information theory, the neural network of the semantic auto-encoder stores side information by training, and provides side information in iterative decoding, as an implementation of the Wyner-Ziv theorem. Simulation results verify the performance improvement by SemantIC without extra channel resource cost.
    Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks. (arXiv:2310.12516v1 [cs.CL])
    Although remarkable progress has been achieved in preventing large language model (LLM) hallucinations using instruction tuning and retrieval augmentation, it remains challenging to measure the reliability of LLMs using human-crafted evaluation data which is not available for many tasks and domains and could suffer from data leakage. Inspired by adversarial machine learning, this paper aims to develop a method of automatically generating evaluation data by appropriately modifying existing data on which LLMs behave faithfully. Specifically, this paper presents AutoDebug, an LLM-based framework to use prompting chaining to generate transferable adversarial attacks in the form of question-answering examples. We seek to understand the extent to which these examples trigger the hallucination behaviors of LLMs. We implement AutoDebug using ChatGPT and evaluate the resulting two variants of a popular open-domain question-answering dataset, Natural Questions (NQ), on a collection of open-source and proprietary LLMs under various prompting settings. Our generated evaluation data is human-readable and, as we show, humans can answer these modified questions well. Nevertheless, we observe pronounced accuracy drops across multiple LLMs including GPT-4. Our experimental results show that LLMs are likely to hallucinate in two categories of question-answering scenarios where (1) there are conflicts between knowledge given in the prompt and their parametric knowledge, or (2) the knowledge expressed in the prompt is complex. Finally, we find that the adversarial examples generated by our method are transferable across all considered LLMs. The examples generated by a small model can be used to debug a much larger model, making our approach cost-effective.
    Open-World Lifelong Graph Learning. (arXiv:2310.12565v1 [cs.LG])
    We study the problem of lifelong graph learning in an open-world scenario, where a model needs to deal with new tasks and potentially unknown classes. We utilize Out-of-Distribution (OOD) detection methods to recognize new classes and adapt existing non-graph OOD detection methods to graph data. Crucially, we suggest performing new class detection by combining OOD detection methods with information aggregated from the graph neighborhood. Most OOD detection methods avoid determining a crisp threshold for deciding whether a vertex is OOD. To tackle this problem, we propose a Weakly-supervised Relevance Feedback (Open-WRF) method, which decreases the sensitivity to thresholds in OOD detection. We evaluate our approach on six benchmark datasets. Our results show that the proposed neighborhood aggregation method for OOD scores outperforms existing methods independent of the underlying graph neural network. Furthermore, we demonstrate that our Open-WRF method is more robust to threshold selection and analyze the influence of graph neighborhood on OOD detection. The aggregation and threshold methods are compatible with arbitrary graph neural networks and OOD detection methods, making our approach versatile and applicable to many real-world applications.
    A Unifying Framework for Learning Argumentation Semantics. (arXiv:2310.12309v1 [cs.AI])
    Argumentation is a very active research field of Artificial Intelligence concerned with the representation and evaluation of arguments used in dialogues between humans and/or artificial agents. Acceptability semantics of formal argumentation systems define the criteria for the acceptance or rejection of arguments. Several software systems, known as argumentation solvers, have been developed to compute the accepted/rejected arguments using such criteria. These include systems that learn to identify the accepted arguments using non-interpretable methods. In this paper we present a novel framework, which uses an Inductive Logic Programming approach to learn the acceptability semantics for several abstract and structured argumentation frameworks in an interpretable way. Through an empirical evaluation we show that our framework outperforms existing argumentation solvers, thus opening up new future research directions in the area of formal argumentation and human-machine dialogues.
    Operator-Based Detecting, Learning, and Stabilizing Unstable Periodic Orbits of Chaotic Attractors. (arXiv:2310.12156v1 [nlin.AO])
    This paper examines the use of operator-theoretic approaches to the analysis of chaotic systems through the lens of their unstable periodic orbits (UPOs). Our approach involves three data-driven steps for detecting, identifying, and stabilizing UPOs. We demonstrate the use of kernel integral operators within delay coordinates as an innovative method for UPO detection. For identifying the dynamic behavior associated with each individual UPO, we utilize the Koopman operator to present the dynamics as linear equations in the space of Koopman eigenfunctions. This allows for characterizing the chaotic attractor by investigating its principal dynamical modes across varying UPOs. We extend this methodology into an interpretable machine learning framework aimed at stabilizing strange attractors on their UPOs. To illustrate the efficacy of our approach, we apply it to the Lorenz attractor as a case study.
    Category-Agnostic 6D Pose Estimation with Conditional Neural Processes. (arXiv:2206.07162v2 [cs.CV] UPDATED)
    We present a novel meta-learning approach for 6D pose estimation on unknown objects. In contrast to ``instance-level" and ``category-level" pose estimation methods, our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories. Specifically, we employ a neural process-based meta-learning approach to train an encoder to capture texture and geometry of an object in a latent representation, based on very few RGB-D images and ground-truth keypoints. The latent representation is then used by a simultaneously meta-trained decoder to predict the 6D pose of the object in new images. Furthermore, we propose a novel geometry-aware decoder for the keypoint prediction using a Graph Neural Network (GNN), which explicitly takes geometric constraints specific to each object into consideration. To evaluate our algorithm, extensive experiments are conducted on the \linemod dataset, and on our new fully-annotated synthetic datasets generated from Multiple Categories in Multiple Scenes (MCMS). Experimental results demonstrate that our model performs well on unseen objects with very different shapes and appearances. Remarkably, our model also shows robust performance on occluded scenes although trained fully on data without occlusion. To our knowledge, this is the first work exploring \textbf{cross-category level} 6D pose estimation.
    Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models. (arXiv:2310.12568v1 [cs.LG])
    The fast-paced development of machine learning (ML) methods coupled with its increasing adoption in research poses challenges for researchers without extensive training in ML. In neuroscience, for example, ML can help understand brain-behavior relationships, diagnose diseases, and develop biomarkers using various data sources like magnetic resonance imaging and electroencephalography. The primary objective of ML is to build models that can make accurate predictions on unseen data. Researchers aim to prove the existence of such generalizable models by evaluating performance using techniques such as cross-validation (CV), which uses systematic subsampling to estimate the generalization performance. Choosing a CV scheme and evaluating an ML pipeline can be challenging and, if used improperly, can lead to overestimated results and incorrect interpretations. We created julearn, an open-source Python library, that allow researchers to design and evaluate complex ML pipelines without encountering in common pitfalls. In this manuscript, we present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects that can be easily implemented using this novel library. Julearn aims to simplify the entry into the ML world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls. With its design, unique features and simple interface, it poses as a useful Python-based library for research projects.
    Differentiable Vertex Fitting for Jet Flavour Tagging. (arXiv:2310.12804v1 [hep-ex])
    We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.
    Learning threshold neurons via the "edge of stability". (arXiv:2212.07469v2 [cs.LG] UPDATED)
    Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.
    DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation. (arXiv:2310.12570v1 [eess.IV])
    Great progress has been made in automatic medical image segmentation due to powerful deep representation learning. The influence of transformer has led to research into its variants, and large-scale replacement of traditional CNN modules. However, such trend often overlooks the intrinsic feature extraction capabilities of the transformer and potential refinements to both the model and the transformer module through minor adjustments. This study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to introduce the Transformer and dual attention block into the encoder and decoder of the traditional U-shaped architecture. Unlike prior transformer-based solutions, our DA-TransUNet utilizes attention mechanism of transformer and multifaceted feature extraction of DA-Block, which can efficiently combine global, local, and multi-scale features to enhance medical image segmentation. Meanwhile, experimental results show that a dual attention block is added before the Transformer layer to facilitate feature extraction in the U-net structure. Furthermore, incorporating dual attention blocks in skip connections can enhance feature transfer to the decoder, thereby improving image segmentation performance. Experimental results across various benchmark of medical image segmentation reveal that DA-TransUNet significantly outperforms the state-of-the-art methods. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet.
    Canonical normalizing flows for manifold learning. (arXiv:2310.12743v1 [stat.ML])
    Manifold learning flows are a class of generative modelling techniques that assume a low-dimensional manifold description of the data. The embedding of such manifold into the high-dimensional space of the data is achieved via learnable invertible transformations. Therefore, once the manifold is properly aligned via a reconstruction loss, the probability density is tractable on the manifold and maximum likelihood can be used optimize the network parameters. Naturally, the lower-dimensional representation of the data requires an injective-mapping. Recent approaches were able to enforce that density aligns with the modelled manifold, while efficiently calculating the density volume-change term when embedding to the higher-dimensional space. However, unless the injective-mapping is analytically predefined, the learned manifold is not necessarily an efficient representation of the data. Namely, the latent dimensions of such models frequently learn an entangled intrinsic basis with degenerate information being stored in each dimension. Alternatively, if a locally orthogonal and/or sparse basis is to be learned, here coined canonical intrinsic basis, it can serve in learning a more compact latent space representation. Towards this end, we propose a canonical manifold learning flow method, where a novel optimization objective enforces the transformation matrix to have few prominent and orthogonal basis functions. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data, and consequently a better approximation of target distributions than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.
    Transformer-based Entity Legal Form Classification. (arXiv:2310.12766v1 [cs.CL])
    We propose the application of Transformer-based language models for classifying entity legal forms from raw legal entity names. Specifically, we employ various BERT variants and compare their performance against multiple traditional baselines. Our evaluation encompasses a substantial subset of freely available Legal Entity Identifier (LEI) data, comprising over 1.1 million legal entities from 30 different legal jurisdictions. The ground truth labels for classification per jurisdiction are taken from the Entity Legal Form (ELF) code standard (ISO 20275). Our findings demonstrate that pre-trained BERT variants outperform traditional text classification approaches in terms of F1 score, while also performing comparably well in the Macro F1 Score. Moreover, the validity of our proposal is supported by the outcome of third-party expert reviews conducted in ten selected jurisdictions. This study highlights the significant potential of Transformer-based models in advancing data standardization and data integration. The presented approaches can greatly benefit financial institutions, corporations, governments and other organizations in assessing business relationships, understanding risk exposure, and promoting effective governance.
    Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach. (arXiv:2310.12428v1 [stat.ML])
    We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.
    AI Potentiality and Awareness: A Position Paper from the Perspective of Human-AI Teaming in Cybersecurity. (arXiv:2310.12162v1 [cs.CR])
    This position paper explores the broad landscape of AI potentiality in the context of cybersecurity, with a particular emphasis on its possible risk factors with awareness, which can be managed by incorporating human experts in the loop, i.e., "Human-AI" teaming. As artificial intelligence (AI) technologies advance, they will provide unparalleled opportunities for attack identification, incident response, and recovery. However, the successful deployment of AI into cybersecurity measures necessitates an in-depth understanding of its capabilities, challenges, and ethical and legal implications to handle associated risk factors in real-world application areas. Towards this, we emphasize the importance of a balanced approach that incorporates AI's computational power with human expertise. AI systems may proactively discover vulnerabilities and detect anomalies through pattern recognition, and predictive modeling, significantly enhancing speed and accuracy. Human experts can explain AI-generated decisions to stakeholders, regulators, and end-users in critical situations, ensuring responsibility and accountability, which helps establish trust in AI-driven security solutions. Therefore, in this position paper, we argue that human-AI teaming is worthwhile in cybersecurity, in which human expertise such as intuition, critical thinking, or contextual understanding is combined with AI's computational power to improve overall cyber defenses.
    Preliminary studies: Comparing LSTM and BLSTM Deep Neural Networks for Power Consumption Prediction. (arXiv:2305.16546v2 [cs.LG] UPDATED)
    Electric consumption prediction methods are investigated for many reasons such as decision-making related to energy efficiency as well as for anticipating demand in the energy market dynamics. The objective of the present work is the comparison between two Deep Learning models, namely the Long Short-Term Memory (LSTM) and Bi-directional LSTM (BLSTM) for univariate electric consumption Time Series (TS) short-term forecast. The Data Sets (DSs) were selected for their different contexts and scales, aiming the assessment of the models' robustness. Four DSs were used, related to the power consumption of: (a) a household in France; (b) a university building in Santar\'em, Brazil; (c) the T\'etouan city zones, in Morocco; and (c) the Singapore aggregated electric demand. The metrics RMSE, MAE, MAPE and R2 were calculated in a TS cross-validation scheme. The Friedman's test was applied to normalized RMSE (NRMSE) results, showing that BLSTM outperforms LSTM with statistically significant difference (p = 0.0455), corroborating the fact that bidirectional weight updating improves significantly the LSTM performance concerning different scales of electric power consumption.
    Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing. (arXiv:2310.12404v1 [cs.SD])
    Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpret user intentions and select appropriate AI models for task execution. Each backend model is specialized for a specific task, and their outputs are aggregated to meet the user's requirements. To ensure musical coherence, essential attributes are maintained in a centralized table. We evaluate the effectiveness of the proposed system through semi-structured interviews and questionnaires, highlighting its utility not only in facilitating music creation but also its potential for broader applications.
    Knowledge from Uncertainty in Evidential Deep Learning. (arXiv:2310.12663v1 [cs.LG])
    This work reveals an evidential signal that emerges from the uncertainty value in Evidential Deep Learning (EDL). EDL is one example of a class of uncertainty-aware deep learning approaches designed to provide confidence (or epistemic uncertainty) about the current test sample. In particular for computer vision and bidirectional encoder large language models, the `evidential signal' arising from the Dirichlet strength in EDL can, in some cases, discriminate between classes, which is particularly strong when using large language models. We hypothesise that the KL regularisation term causes EDL to couple aleatoric and epistemic uncertainty. In this paper, we empirically investigate the correlations between misclassification and evaluated uncertainty, and show that EDL's `evidential signal' is due to misclassification bias. We critically evaluate EDL with other Dirichlet-based approaches, namely Generative Evidential Neural Networks (EDL-GEN) and Prior Networks, and show theoretically and empirically the differences between these loss functions. We conclude that EDL's coupling of uncertainty arises from these differences due to the use (or lack) of out-of-distribution samples during training.
    Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors. (arXiv:2304.07063v3 [cs.AI] UPDATED)
    Reasoning on knowledge graphs is a challenging task because it utilizes observed information to predict the missing one. Particularly, answering complex queries based on first-order logic is one of the crucial tasks to verify learning to reason abilities for generalization and composition. Recently, the prevailing method is query embedding which learns the embedding of a set of entities and treats logic operations as set operations and has shown great empirical success. Though there has been much research following the same formulation, many of its claims lack a formal and systematic inspection. In this paper, we rethink this formulation and justify many of the previous claims by characterizing the scope of queries investigated previously and precisely identifying the gap between its formulation and its goal, as well as providing complexity analysis for the currently investigated queries. Moreover, we develop a new dataset containing ten new types of queries with features that have never been considered and therefore can provide a thorough investigation of complex queries. Finally, we propose a new neural-symbolic method, Fuzzy Inference with Truth value (FIT), where we equip the neural link predictors with fuzzy logic theory to support end-to-end learning using complex queries with provable reasoning capability. Empirical results show that our method outperforms previous methods significantly in the new dataset and also surpasses previous methods in the existing dataset at the same time.
    2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision. (arXiv:2310.12817v1 [cs.CV])
    We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand. To this end, we propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation using only scene-level class tags. Specifically, the two encoders compute the self-attended features for 3D point clouds and 2D multi-view images, respectively. The decoder implements interlaced 2D-3D cross-attention and carries out implicit 2D and 3D feature fusion. We alternately switch the roles of queries and key-value pairs in the decoder layers. It turns out that the 2D and 3D features are iteratively enriched by each other. Experiments show that it performs favorably against existing weakly supervised point cloud segmentation methods by a large margin on the S3DIS and ScanNet benchmarks. The project page will be available at https://jimmy15923.github.io/mit_web/.
    Model Merging by Uncertainty-Based Gradient Matching. (arXiv:2310.12808v1 [cs.LG])
    Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.
    Knowledge-Augmented Language Model Verification. (arXiv:2310.12836v1 [cs.CL])
    Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. Yet, LMs often generate the factually incorrect responses to the given queries, since their knowledge may be inaccurate, incomplete, and outdated. To address this problem, previous works propose to augment LMs with the knowledge retrieved from an external knowledge source. However, such approaches often show suboptimal text generation performance due to two reasons: 1) the model may fail to retrieve the knowledge relevant to the given query, or 2) the model may not faithfully reflect the retrieved knowledge in the generated text. To overcome these, we propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier, which is a small LM that is trained to detect those two types of errors through instruction-finetuning. Then, when the verifier recognizes an error, we can rectify it by either retrieving new knowledge or generating new text. Further, we use an ensemble of the outputs from different instructions with a single verifier to enhance the reliability of the verification processes. We validate the effectiveness of the proposed verification steps on multiple question answering benchmarks, whose results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs. Our code is available at https://github.com/JinheonBaek/KALMV.
    AgentTuning: Enabling Generalized Agent Abilities for LLMs. (arXiv:2310.12823v1 [cs.CL])
    Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a lightweight instruction-tuning dataset containing high-quality interaction trajectories. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. AgentTuning is used to instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent tasks, demonstrating generalized agent capabilities. We open source the AgentInstruct and AgentLM-7B, 13B, and 70B models at https://github.com/THUDM/AgentTuning , serving open and powerful alternatives to commercial LLMs for agent tasks.
    Learn from the Past: A Proxy based Adversarial Defense Framework to Boost Robustness. (arXiv:2310.12713v1 [cs.LG])
    In light of the vulnerability of deep learning models to adversarial samples and the ensuing security issues, a range of methods, including Adversarial Training (AT) as a prominent representative, aimed at enhancing model robustness against various adversarial attacks, have seen rapid development. However, existing methods essentially assist the current state of target model to defend against parameter-oriented adversarial attacks with explicit or implicit computation burdens, which also suffers from unstable convergence behavior due to inconsistency of optimization trajectories. Diverging from previous work, this paper reconsiders the update rule of target model and corresponding deficiency to defend based on its current state. By introducing the historical state of the target model as a proxy, which is endowed with much prior information for defense, we formulate a two-stage update rule, resulting in a general adversarial defense framework, which we refer to as `LAST' ({\bf L}earn from the P{\bf ast}). Besides, we devise a Self Distillation (SD) based defense objective to constrain the update process of the proxy model without the introduction of larger teacher models. Experimentally, we demonstrate consistent and significant performance enhancements by refining a series of single-step and multi-step AT methods (e.g., up to $\bf 9.2\%$ and $\bf 20.5\%$ improvement of Robust Accuracy (RA) on CIFAR10 and CIFAR100 datasets, respectively) across various datasets, backbones and attack modalities, and validate its ability to enhance training stability and ameliorate catastrophic overfitting issues meanwhile.
    OceanGPT: A Large Language Model for Ocean Science Tasks. (arXiv:2310.02031v3 [cs.CL] UPDATED)
    Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reason may be the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever LLM in the ocean domain, which is expert in various ocean science tasks. We propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. Codes, data and checkpoints will soon be available at https://github.com/zjunlp/KnowLM.
    Causal Similarity-Based Hierarchical Bayesian Models. (arXiv:2310.12595v1 [cs.LG])
    The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data.
    Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights. (arXiv:2310.12462v1 [cs.LG])
    In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and outputs? We introduce a theoretical framework to tackle this problem. Specifically, we present an algorithm that aims to recover the input data $X \in \mathbb{R}^{d \times n}$ from given attention weights $W = QK^\top \in \mathbb{R}^{d \times d}$ and output $B \in \mathbb{R}^{n \times n}$ by minimizing the loss function $L(X)$. This loss function captures the discrepancy between the expected output and the actual output of the transformer. Our findings have significant implications for the Localized Layer-wise Mechanism (LLM), suggesting potential vulnerabilities in the model's design from a security and privacy perspective. This work underscores the importance of understanding and safeguarding the internal workings of transformers to ensure the confidentiality of processed data.
    Conditional Density Estimations from Privacy-Protected Data. (arXiv:2310.12781v1 [stat.ML])
    Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only the privatized data during statistical analysis makes it computationally challenging to perform valid inferences on parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
    Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models. (arXiv:2310.12818v1 [cs.CL])
    Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise. However, it is important to note that parameter sharing does not alleviate computational burdens associated with inference, thus impeding its practicality in situations characterized by limited stringent latency requirements or computational resources. Building upon neural ordinary differential equations (ODEs), we introduce a straightforward technique to enhance the inference efficiency of parameter-shared PLMs. Additionally, we propose a simple pre-training technique that leads to fully or partially shared models capable of achieving even greater inference acceleration. The experimental results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs, providing novel insights into more efficient utilization of parameter-shared models in resource-constrained settings.
    Fast Model Debias with Machine Unlearning. (arXiv:2310.12560v1 [cs.LG])
    Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets along with experiments with large language models demonstrate that our method achieves superior or competing accuracies compared with state-of-the-art methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.
    An Improved Metarounding Algorithm via Frank-Wolfe. (arXiv:2310.12629v1 [cs.DS])
    Metarounding is an approach to convert an approximation algorithm for linear optimization over some combinatorial classes to an online linear optimization algorithm for the same class. We propose a new metarounding algorithm under a natural assumption that a relax-based approximation algorithm exists for the combinatorial class. Our algorithm is much more efficient in both theoretical and practical aspects.
    Label-Aware Automatic Verbalizer for Few-Shot Text Classification. (arXiv:2310.12778v1 [cs.CL])
    Prompt-based learning has shown its effectiveness in few-shot text classification. One important factor in its success is a verbalizer, which translates output from a language model into a predicted class. Notably, the simplest and widely acknowledged verbalizer employs manual labels to represent the classes. However, manual selection does not guarantee the optimality of the selected words when conditioned on the chosen language model. Therefore, we propose Label-Aware Automatic Verbalizer (LAAV), effectively augmenting the manual labels to achieve better few-shot classification results. Specifically, we use the manual labels along with the conjunction "and" to induce the model to generate more effective words for the verbalizer. The experimental results on five datasets across five languages demonstrate that LAAV significantly outperforms existing verbalizers. Furthermore, our analysis reveals that LAAV suggests more relevant words compared to similar approaches, especially in mid-to-low resource languages.
    MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features. (arXiv:2310.12451v1 [cs.LG])
    Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of joint-embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.
    Improved Operator Learning by Orthogonal Attention. (arXiv:2310.12487v1 [cs.LG])
    Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing baselines with decent margins.
    Attack Prompt Generation for Red Teaming and Defending Large Language Models. (arXiv:2310.12505v1 [cs.CL])
    Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. Previous research constructs attack prompts via manual or automatic methods, which have their own limitations on construction cost and quality. To address these issues, we propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts. Specifically, considering the impressive capabilities of newly emerged LLMs, we propose an attack framework to instruct LLMs to mimic human-generated prompts through in-context learning. Furthermore, we propose a defense framework that fine-tunes victim LLMs through iterative interactions with the attack framework to enhance their safety against red teaming attacks. Extensive experiments on different LLMs validate the effectiveness of our proposed attack and defense frameworks. Additionally, we release a series of attack prompts datasets named SAP with varying sizes, facilitating the safety evaluation and enhancement of more LLMs. Our code and dataset is available on https://github.com/Aatrox103/SAP .
    Piecewise Deterministic Markov Processes for Bayesian Neural Networks. (arXiv:2302.08724v2 [stat.ML] UPDATED)
    Inference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subsampling, though introduce a model specific inhomogenous Poisson Process (IPPs) which is difficult to sample from. This work introduces a new generic and adaptive thinning scheme for sampling from these IPPs, and demonstrates how this approach can accelerate the application of PDMPs for inference in BNNs. Experimentation illustrates how inference with these methods is computationally feasible, can improve predictive accuracy, MCMC mixing performance, and provide informative uncertainty measurements when compared against other approximate inference schemes.
    Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach. (arXiv:2207.06949v4 [stat.ML] UPDATED)
    Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.
    OODRobustBench: benchmarking and analyzing adversarial robustness under distribution shift. (arXiv:2310.12793v1 [cs.LG])
    Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This is a concerning omission as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness, in a positive linear way, under many distribution shifts. The latter enables the prediction of OOD robustness from ID robustness. Based on this, we are able to predict the upper limit of OOD robustness for existing robust training schemes. The results suggest that achieving OOD robustness requires designing novel methods beyond the conventional ones. Last, we discover that extra data, data augmentation, advanced model architectures and particular regularization approaches can improve OOD robustness. Noticeably, the discovered training schemes, compared to the baseline, exhibit dramatically higher robustness under threat shift while keeping high ID robustness, demonstrating new promising solutions for robustness against both multi-attack and unforeseen attacks.
    Energy-Based Models For Speech Synthesis. (arXiv:2310.12765v1 [cs.SD])
    Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inference efficient. This paper expands the range of available non-AR models with another member called energy-based models (EBMs). The paper describes how noise contrastive estimation, which relies on the comparison between positive and negative samples, can be used to train EBMs. It proposes a number of strategies for generating effective negative samples, including using high-performing AR models. It also describes how sampling from EBMs can be performed using Langevin Markov Chain Monte-Carlo (MCMC). The use of Langevin MCMC enables to draw connections between EBMs and currently popular diffusion models. Experiments on LJSpeech dataset show that the proposed approach offers improvements over Tacotron 2.
    Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers. (arXiv:2310.12528v1 [astro-ph.IM])
    Machine learning has rapidly become a tool of choice for the astronomical community. It is being applied across a wide range of wavelengths and problems, from the classification of transients to neural network emulators of cosmological simulations, and is shifting paradigms about how we generate and report scientific results. At the same time, this class of method comes with its own set of best practices, challenges, and drawbacks, which, at present, are often reported on incompletely in the astrophysical literature. With this paper, we aim to provide a primer to the astronomical community, including authors, reviewers, and editors, on how to implement machine learning models and report their results in a way that ensures the accuracy of the results, reproducibility of the findings, and usefulness of the method.
    Document-Level Language Models for Machine Translation. (arXiv:2310.12303v1 [cs.CL])
    Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores substantially and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination.  ( 2 min )
    Improving SCGAN's Similarity Constraint and Learning a Better Disentangled Representation. (arXiv:2310.12262v1 [cs.CV])
    SCGAN adds a similarity constraint between generated images and conditions as a regularization term on generative adversarial networks. Similarity constraint works as a tutor to instruct the generator network to comprehend the difference of representations based on conditions. We understand how SCGAN works on a deeper level. This understanding makes us realize that the similarity constraint functions like the contrastive loss function. We believe that a model with high understanding and intelligence measures the similarity between images based on their structure and high level features, just like humans do. Two major changes we applied to SCGAN in order to make a modified model are using SSIM to measure similarity between images and applying contrastive loss principles to the similarity constraint. The modified model performs better using FID and FactorVAE metrics. The modified model also has better generalisability compared to other models. Keywords Generative Adversarial Nets, Unsupervised Learning, Disentangled Representation Learning, Contrastive Disentanglement, SSIM  ( 2 min )
    American Option Pricing using Self-Attention GRU and Shapley Value Interpretation. (arXiv:2310.12500v1 [q-fin.PR])
    Options, serving as a crucial financial instrument, are used by investors to manage and mitigate their investment risks within the securities market. Precisely predicting the present price of an option enables investors to make informed and efficient decisions. In this paper, we propose a machine learning method for forecasting the prices of SPY (ETF) option based on gated recurrent unit (GRU) and self-attention mechanism. We first partitioned the raw dataset into 15 subsets according to moneyness and days to maturity criteria. For each subset, we matched the corresponding U.S. government bond rates and Implied Volatility Indices. This segmentation allows for a more insightful exploration of the impacts of risk-free rates and underlying volatility on option pricing. Next, we built four different machine learning models, including multilayer perceptron (MLP), long short-term memory (LSTM), self-attention LSTM, and self-attention GRU in comparison to the traditional binomial model. The empirical result shows that self-attention GRU with historical data outperforms other models due to its ability to capture complex temporal dependencies and leverage the contextual information embedded in the historical data. Finally, in order to unveil the "black box" of artificial intelligence, we employed the SHapley Additive exPlanations (SHAP) method to interpret and analyze the prediction results of the self-attention GRU model with historical data. This provides insights into the significance and contributions of different input features on the pricing of American-style options.  ( 2 min )
    Opportunities for Adaptive Experiments to Enable Continuous Improvement that Trades-off Instructor and Researcher Incentives. (arXiv:2310.12324v1 [cs.HC])
    Randomized experimental comparisons of alternative pedagogical strategies could provide useful empirical evidence in instructors' decision-making. However, traditional experiments do not have a clear and simple pathway to using data rapidly to try to increase the chances that students in an experiment get the best conditions. Drawing inspiration from the use of machine learning and experimentation in product development at leading technology companies, we explore how adaptive experimentation might help in continuous course improvement. In adaptive experiments, as different arms/conditions are deployed to students, data is analyzed and used to change the experience for future students. This can be done using machine learning algorithms to identify which actions are more promising for improving student experience or outcomes. This algorithm can then dynamically deploy the most effective conditions to future students, resulting in better support for students' needs. We illustrate the approach with a case study providing a side-by-side comparison of traditional and adaptive experimentation of self-explanation prompts in online homework problems in a CS1 course. This provides a first step in exploring the future of how this methodology can be useful in bridging research and practice in doing continuous improvement.  ( 2 min )
    WeaveNet for Approximating Two-sided Matching Problems. (arXiv:2310.12515v1 [cs.LG])
    Matching, a task to optimally assign limited resources under constraints, is a fundamental technology for society. The task potentially has various objectives, conditions, and constraints; however, the efficient neural network architecture for matching is underexplored. This paper proposes a novel graph neural network (GNN), \textit{WeaveNet}, designed for bipartite graphs. Since a bipartite graph is generally dense, general GNN architectures lose node-wise information by over-smoothing when deeply stacked. Such a phenomenon is undesirable for solving matching problems. WeaveNet avoids it by preserving edge-wise information while passing messages densely to reach a better solution. To evaluate the model, we approximated one of the \textit{strongly NP-hard} problems, \textit{fair stable matching}. Despite its inherent difficulties and the network's general purpose design, our model reached a comparative performance with state-of-the-art algorithms specially designed for stable matching for small numbers of agents.  ( 2 min )
    Enhanced Graph Neural Networks with Ego-Centric Spectral Subgraph Embeddings Augmentation. (arXiv:2310.12169v1 [cs.SI])
    Graph Neural Networks (GNNs) have shown remarkable merit in performing various learning-based tasks in complex networks. The superior performance of GNNs often correlates with the availability and quality of node-level features in the input networks. However, for many network applications, such node-level information may be missing or unreliable, thereby limiting the applicability and efficacy of GNNs. To address this limitation, we present a novel approach denoted as Ego-centric Spectral subGraph Embedding Augmentation (ESGEA), which aims to enhance and design node features, particularly in scenarios where information is lacking. Our method leverages the topological structure of the local subgraph to create topology-aware node features. The subgraph features are generated using an efficient spectral graph embedding technique, and they serve as node features that capture the local topological organization of the network. The explicit node features, if present, are then enhanced with the subgraph embeddings in order to improve the overall performance. ESGEA is compatible with any GNN-based architecture and is effective even in the absence of node features. We evaluate the proposed method in a social network graph classification task where node attributes are unavailable, as well as in a node classification task where node features are corrupted or even absent. The evaluation results on seven datasets and eight baseline models indicate up to a 10% improvement in AUC and a 7% improvement in accuracy for graph and node classification tasks, respectively.  ( 3 min )
    Equipping Federated Graph Neural Networks with Structure-aware Group Fairness. (arXiv:2310.12350v1 [cs.LG])
    Graph Neural Networks (GNNs) have been widely used for various types of graph data processing and analytical tasks in different domains. Training GNNs over centralized graph data can be infeasible due to privacy concerns and regulatory restrictions. Thus, federated learning (FL) becomes a trending solution to address this challenge in a distributed learning paradigm. However, as GNNs may inherit historical bias from training data and lead to discriminatory predictions, the bias of local models can be easily propagated to the global model in distributed settings. This poses a new challenge in mitigating bias in federated GNNs. To address this challenge, we propose $\text{F}^2$GNN, a Fair Federated Graph Neural Network, that enhances group fairness of federated GNNs. As bias can be sourced from both data and learning algorithms, $\text{F}^2$GNN aims to mitigate both types of bias under federated settings. First, we provide theoretical insights on the connection between data bias in a training graph and statistical fairness metrics of the trained GNN models. Based on the theoretical analysis, we design $\text{F}^2$GNN which contains two key components: a fairness-aware local model update scheme that enhances group fairness of the local models on the client side, and a fairness-weighted global model update scheme that takes both data bias and fairness metrics of local models into consideration in the aggregation process. We evaluate $\text{F}^2$GNN empirically versus a number of baseline methods, and demonstrate that $\text{F}^2$GNN outperforms these baselines in terms of both fairness and model accuracy.  ( 3 min )
    Few-Shot In-Context Imitation Learning via Implicit Graph Alignment. (arXiv:2310.12238v1 [cs.RO])
    Consider the following problem: given a few demonstrations of a task across a few different objects, how can a robot learn to perform that same task on new, previously unseen objects? This is challenging because the large variety of objects within a class makes it difficult to infer the task-relevant relationship between the new objects and the objects in the demonstrations. We address this by formulating imitation learning as a conditional alignment problem between graph representations of objects. Consequently, we show that this conditioning allows for in-context learning, where a robot can perform a task on a set of new objects immediately after the demonstrations, without any prior knowledge about the object class or any further training. In our experiments, we explore and validate our design choices, and we show that our method is highly effective for few-shot learning of several real-world, everyday tasks, whilst outperforming baselines. Videos are available on our project webpage at https://www.robot-learning.uk/implicit-graph-alignment.  ( 2 min )
    ClusT3: Information Invariant Test-Time Training. (arXiv:2310.12345v1 [cs.CV])
    Deep Learning models have shown remarkable performance in a broad range of vision tasks. However, they are often vulnerable against domain shifts at test-time. Test-time training (TTT) methods have been developed in an attempt to mitigate these vulnerabilities, where a secondary task is solved at training time simultaneously with the main task, to be later used as an self-supervised proxy task at test-time. In this work, we propose a novel unsupervised TTT technique based on the maximization of Mutual Information between multi-scale feature maps and a discrete latent representation, which can be integrated to the standard training as an auxiliary clustering task. Experimental results demonstrate competitive classification performance on different popular test-time adaptation benchmarks.  ( 2 min )
    Learning to Solve Climate Sensor Placement Problems with a Transformer. (arXiv:2310.12387v1 [cs.LG])
    The optimal placement of sensors for environmental monitoring and disaster management is a challenging problem due to its NP-hard nature. Traditional methods for sensor placement involve exact, approximation, or heuristic approaches, with the latter being the most widely used. However, heuristic methods are limited by expert intuition and experience. Deep learning (DL) has emerged as a promising approach for generating heuristic algorithms automatically. In this paper, we introduce a novel sensor placement approach focused on learning improvement heuristics using deep reinforcement learning (RL) methods. Our approach leverages an RL formulation for learning improvement heuristics, driven by an actor-critic algorithm for training the policy network. We compare our method with several state-of-the-art approaches by conducting comprehensive experiments, demonstrating the effectiveness and superiority of our proposed approach in producing high-quality solutions. Our work presents a promising direction for applying advanced DL and RL techniques to challenging climate sensor placement problems.  ( 2 min )
    Enhancing the Performance of Automated Grade Prediction in MOOC using Graph Representation Learning. (arXiv:2310.12281v1 [cs.LG])
    In recent years, Massive Open Online Courses (MOOCs) have gained significant traction as a rapidly growing phenomenon in online learning. Unlike traditional classrooms, MOOCs offer a unique opportunity to cater to a diverse audience from different backgrounds and geographical locations. Renowned universities and MOOC-specific providers, such as Coursera, offer MOOC courses on various subjects. Automated assessment tasks like grade and early dropout predictions are necessary due to the high enrollment and limited direct interaction between teachers and learners. However, current automated assessment approaches overlook the structural links between different entities involved in the downstream tasks, such as the students and courses. Our hypothesis suggests that these structural relationships, manifested through an interaction graph, contain valuable information that can enhance the performance of the task at hand. To validate this, we construct a unique knowledge graph for a large MOOC dataset, which will be publicly available to the research community. Furthermore, we utilize graph embedding techniques to extract latent structural information encoded in the interactions between entities in the dataset. These techniques do not require ground truth labels and can be utilized for various tasks. Finally, by combining entity-specific features, behavioral features, and extracted structural features, we enhance the performance of predictive machine learning models in student assignment grade prediction. Our experiments demonstrate that structural features can significantly improve the predictive performance of downstream assessment tasks. The code and data are available in \url{https://github.com/DSAatUSU/MOOPer_grade_prediction}  ( 3 min )
    MARVEL: Multi-Agent Reinforcement-Learning for Large-Scale Variable Speed Limits. (arXiv:2310.12359v1 [cs.MA])
    Variable speed limit (VSL) control is a promising traffic management strategy for enhancing safety and mobility. This work introduces MARVEL, a multi-agent reinforcement learning (MARL) framework for implementing large-scale VSL control on freeway corridors using only commonly available data. The agents learn through a reward structure that incorporates adaptability to traffic conditions, safety, and mobility; enabling coordination among the agents. The proposed framework scales to cover corridors with many gantries thanks to a parameter sharing among all VSL agents. The agents are trained in a microsimulation environment based on a short freeway stretch with 8 gantries spanning 7 miles and tested with 34 gantries spanning 17 miles of I-24 near Nashville, TN. MARVEL improves traffic safety by 63.4% compared to the no control scenario and enhances traffic mobility by 14.6% compared to a state-of-the-practice algorithm that has been deployed on I-24. An explainability analysis is undertaken to explore the learned policy under different traffic conditions and the results provide insights into the decision-making process of agents. Finally, we test the policy learned from the simulation-based experiments on real input data from I-24 to illustrate the potential deployment capability of the learned policy.  ( 2 min )
    Architectural Implications of GNN Aggregation Programming Abstractions. (arXiv:2310.12184v1 [cs.LG])
    Graph neural networks (GNNs) have gained significant popularity due to the powerful capability to extract useful representations from graph data. As the need for efficient GNN computation intensifies, a variety of programming abstractions designed for optimizing GNN Aggregation have emerged to facilitate acceleration. However, there is no comprehensive evaluation and analysis upon existing abstractions, thus no clear consensus on which approach is better. In this letter, we classify existing programming abstractions for GNN Aggregation by the dimension of data organization and propagation method. By constructing these abstractions on a state-of-the-art GNN library, we perform a thorough and detailed characterization study to compare their performance and efficiency, and provide several insights on future GNN acceleration based on our analysis.  ( 2 min )
    Balanced Group Convolution: An Improved Group Convolution Based on Approximability Estimates. (arXiv:2310.12461v1 [cs.LG])
    The performance of neural networks has been significantly improved by increasing the number of channels in convolutional layers. However, this increase in performance comes with a higher computational cost, resulting in numerous studies focused on reducing it. One promising approach to address this issue is group convolution, which effectively reduces the computational cost by grouping channels. However, to the best of our knowledge, there has been no theoretical analysis on how well the group convolution approximates the standard convolution. In this paper, we mathematically analyze the approximation of the group convolution to the standard convolution with respect to the number of groups. Furthermore, we propose a novel variant of the group convolution called balanced group convolution, which shows a higher approximation with a small additional computational cost. We provide experimental results that validate our theoretical findings and demonstrate the superior performance of the balanced group convolution over other variants of group convolution.  ( 2 min )
    RK-core: An Established Methodology for Exploring the Hierarchical Structure within Datasets. (arXiv:2310.12168v1 [cs.LG])
    Recently, the field of machine learning has undergone a transition from model-centric to data-centric. The advancements in diverse learning tasks have been propelled by the accumulation of more extensive datasets, subsequently facilitating the training of larger models on these datasets. However, these datasets remain relatively under-explored. To this end, we introduce a pioneering approach known as RK-core, to empower gaining a deeper understanding of the intricate hierarchical structure within datasets. Across several benchmark datasets, we find that samples with low coreness values appear less representative of their respective categories, and conversely, those with high coreness values exhibit greater representativeness. Correspondingly, samples with high coreness values make a more substantial contribution to the performance in comparison to those with low coreness values. Building upon this, we further employ RK-core to analyze the hierarchical structure of samples with different coreset selection methods. Remarkably, we find that a high-quality coreset should exhibit hierarchical diversity instead of solely opting for representative samples. The code is available at https://github.com/yaolu-zjut/Kcore.  ( 2 min )
    Open-Set Multivariate Time-Series Anomaly Detection. (arXiv:2310.12294v1 [cs.LG])
    Numerous methods for time series anomaly detection (TSAD) methods have emerged in recent years. Most existing methods are unsupervised and assume the availability of normal training samples only, while few supervised methods have shown superior performance by incorporating labeled anomalous samples in the training phase. However, certain anomaly types are inherently challenging for unsupervised methods to differentiate from normal data, while supervised methods are constrained to detecting anomalies resembling those present during training, failing to generalize to unseen anomaly classes. This paper is the first attempt in providing a novel approach for the open-set TSAD problem, in which a small number of labeled anomalies from a limited class of anomalies are visible in the training phase, with the objective of detecting both seen and unseen anomaly classes in the test phase. The proposed method, called Multivariate Open-Set timeseries Anomaly Detection (MOSAD) consists of three primary modules: a Feature Extractor to extract meaningful time-series features; a Multi-head Network consisting of Generative-, Deviation-, and Contrastive heads for capturing both seen and unseen anomaly classes; and an Anomaly Scoring module leveraging the insights of the three heads to detect anomalies. Extensive experiments on three real-world datasets consistently show that our approach surpasses existing methods under various experimental settings, thus establishing a new state-of-the-art performance in the TSAD field.  ( 2 min )
    Preference Optimization for Molecular Language Models. (arXiv:2310.12304v1 [stat.ML])
    Molecular language modeling is an effective approach to generating novel chemical structures. However, these models do not \emph{a priori} encode certain preferences a chemist may desire. We investigate the use of fine-tuning using Direct Preference Optimization to better align generated molecules with chemist preferences. Our findings suggest that this approach is simple, efficient, and highly effective.  ( 2 min )
    SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation. (arXiv:2310.12508v1 [cs.LG])
    With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often grapple with limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' in MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting dataset). To the best of our knowledge, SalUn is the first principled MU approach adaptable enough to effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation. For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.  ( 2 min )
    SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models. (arXiv:2310.12494v1 [cs.LG])
    Understanding the long-term impact of algorithmic interventions on society is vital to achieving responsible AI. Traditional evaluation strategies often fall short due to the complex, adaptive and dynamic nature of society. While reinforcement learning (RL) can be a powerful approach for optimizing decisions in dynamic settings, the difficulty of realistic environment design remains a barrier to building robust agents that perform well in practical settings. To address this issue we tap into the field of system dynamics (SD) as a complementary method that incorporates collaborative simulation model specification practices. We introduce SDGym, a low-code library built on the OpenAI Gym framework which enables the generation of custom RL environments based on SD simulation models. Through a feasibility study we validate that well specified, rich RL environments can be generated from preexisting SD models and a few lines of configuration code. We demonstrate the capabilities of the SDGym environment using an SD model of the electric vehicle adoption problem. We compare two SD simulators, PySD and BPTK-Py for parity, and train a D4PG agent using the Acme framework to showcase learning and environment interaction. Our preliminary findings underscore the dual potential of SD to improve RL environment design and for RL to improve dynamic policy discovery within SD models. By open-sourcing SDGym, the intent is to galvanize further research and promote adoption across the SD and RL communities, thereby catalyzing collaboration in this emerging interdisciplinary space.  ( 2 min )
    Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization. (arXiv:2310.12298v1 [cs.LG])
    Despite their better convergence properties compared to first-order optimizers, second-order optimizers for deep learning have been less popular due to their significant computational costs. The primary efficiency bottleneck in such optimizers is matrix inverse calculations in the preconditioning step, which are expensive to compute on GPUs. In this paper, we introduce Jorge, a second-order optimizer that promises the best of both worlds -- rapid convergence benefits of second-order methods, and high computational efficiency typical of first-order methods. We address the primary computational bottleneck of computing matrix inverses by completely eliminating them using an approximation of the preconditioner computation. This makes Jorge extremely efficient on GPUs in terms of wall-clock time. Further, we describe an approach to determine Jorge's hyperparameters directly from a well-tuned SGD baseline, thereby significantly minimizing tuning efforts. Our empirical evaluations demonstrate the distinct advantages of using Jorge, outperforming state-of-the-art optimizers such as SGD, AdamW, and Shampoo across multiple deep learning models, both in terms of sample efficiency and wall-clock time.  ( 2 min )
    CAT: Closed-loop Adversarial Training for Safe End-to-End Driving. (arXiv:2310.12432v1 [cs.LG])
    Driving safety is a top priority for autonomous vehicles. Orthogonal to prior work handling accident-prone traffic events by algorithm designs at the policy level, we investigate a Closed-loop Adversarial Training (CAT) framework for safe end-to-end driving in this paper through the lens of environment augmentation. CAT aims to continuously improve the safety of driving agents by training the agent on safety-critical scenarios that are dynamically generated over time. A novel resampling technique is developed to turn log-replay real-world driving scenarios into safety-critical ones via probabilistic factorization, where the adversarial traffic generation is modeled as the multiplication of standard motion prediction sub-problems. Consequently, CAT can launch more efficient physical attacks compared to existing safety-critical scenario generation methods and yields a significantly less computational cost in the iterative learning pipeline. We incorporate CAT into the MetaDrive simulator and validate our approach on hundreds of driving scenarios imported from real-world driving datasets. Experimental results demonstrate that CAT can effectively generate adversarial scenarios countering the agent being trained. After training, the agent can achieve superior driving safety in both log-replay and safety-critical traffic scenarios on the held-out test set. Code and data are available at https://metadriverse.github.io/cat.  ( 2 min )
  • Open

    Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching. (arXiv:2306.07960v2 [cs.LG] UPDATED)
    Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations. We arrive at this finding analytically, by establishing that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an orthogonal frame. Extensive experiments conducted across various datasets, architectures, and imbalance scenarios corroborate our finding. Importantly, our experiments reveal that the inclusion of the ReLU activation restores symmetry without compromising test accuracy. This constitutes the first geometry characterization of SCL under imbalances. Additionally, our analysis and experiments underscore the pivotal role of batch selection strategies in representation geometry. By proving necessary and sufficient conditions for mini-batch choices that ensure invariant symmetric representations, we introduce batch-binding as an efficient strategy that guarantees these conditions hold.  ( 2 min )
    A Computational Framework for Solving Wasserstein Lagrangian Flows. (arXiv:2310.10649v2 [cs.LG] CROSS LISTED)
    The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry ($\textit{kinetic energy}$), and the regularization of density paths ($\textit{potential energy}$). These combinations yield different variational problems ($\textit{Lagrangians}$), encompassing many variations of the optimal transport problem such as the Schr\"odinger bridge, unbalanced optimal transport, and optimal transport with physical constraints, among others. In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. Leveraging the dual formulation of the Lagrangians, we propose a novel deep learning based framework approaching all of these problems from a unified perspective. Our method does not require simulating or backpropagating through the trajectories of the learned dynamics, and does not need access to optimal couplings. We showcase the versatility of the proposed framework by outperforming previous approaches for the single-cell trajectory inference, where incorporating prior knowledge into the dynamics is crucial for correct predictions.  ( 2 min )
    Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach. (arXiv:2207.06949v4 [stat.ML] UPDATED)
    Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.  ( 3 min )
    Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data. (arXiv:1811.11479v2 [cs.LG] UPDATED)
    On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.  ( 2 min )
    Optimality Guarantees for Particle Belief Approximation of POMDPs. (arXiv:2210.05015v5 [cs.AI] UPDATED)
    Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.  ( 3 min )
    Variational Inference for SDEs Driven by Fractional Noise. (arXiv:2310.12975v1 [cs.LG])
    We present a novel variational framework for performing inference in (neural) stochastic differential equations (SDEs) driven by Markov-approximate fractional Brownian motion (fBM). SDEs offer a versatile tool for modeling real-world continuous-time dynamic systems with inherent noise and randomness. Combining SDEs with the powerful inference capabilities of variational methods, enables the learning of representative function distributions through stochastic gradient descent. However, conventional SDEs typically assume the underlying noise to follow a Brownian motion (BM), which hinders their ability to capture long-term dependencies. In contrast, fractional Brownian motion (fBM) extends BM to encompass non-Markovian dynamics, but existing methods for inferring fBM parameters are either computationally demanding or statistically inefficient. In this paper, building upon the Markov approximation of fBM, we derive the evidence lower bound essential for efficient variational inference of posterior path measures, drawing from the well-established field of stochastic analysis. Additionally, we provide a closed-form expression to determine optimal approximation coefficients. Furthermore, we propose the use of neural networks to learn the drift, diffusion and control terms within our variational posterior, leading to the variational training of neural-SDEs. In this framework, we also optimize the Hurst index, governing the nature of our fractional noise. Beyond validation on synthetic data, we contribute a novel architecture for variational latent video prediction,-an approach that, to the best of our knowledge, enables the first variational neural-SDE application to video perception.  ( 3 min )
    The Kernel Density Integral Transformation. (arXiv:2309.10194v2 [stat.ML] UPDATED)
    Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering protection from the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.  ( 2 min )
    PAC Prediction Sets Under Label Shift. (arXiv:2310.12964v1 [stat.ML])
    Prediction sets capture uncertainty by predicting sets of labels rather than individual labels, enabling downstream decisions to conservatively account for all plausible outcomes. Conformal inference algorithms construct prediction sets guaranteed to contain the true label with high probability. These guarantees fail to hold in the face of distribution shift, which is precisely when reliable uncertainty quantification can be most useful. We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting. This method estimates the predicted probabilities of the classes in a target domain, as well as the confusion matrix, then propagates uncertainty in these estimates through a Gaussian elimination algorithm to compute confidence intervals for importance weights. Finally, it uses these intervals to construct prediction sets. We evaluate our approach on five datasets: the CIFAR-10, ChestX-Ray and Entity-13 image datasets, the tabular CDC Heart dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller, more informative, prediction sets compared to several baselines.  ( 2 min )
    The Adaptive $\tau$-Lasso: Robustness and Oracle Properties. (arXiv:2304.09310v2 [stat.ML] UPDATED)
    This paper introduces a new regularized version of the robust $\tau$-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive $\ell_1$-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive $\ell_1$-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors $p$, we show that the adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness via the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of $\tau$-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulation experiments. In the face of outliers and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators achieve the best performance or close-to-best performance in terms of prediction and variable selection accuracy compared to other competing regularized estimators for all scenarios considered in this study. Therefore, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators can be effectively employed for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points.  ( 3 min )
    URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates. (arXiv:2307.03810v2 [cs.LG] UPDATED)
    Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under https://github.com/mkirchhof/url .  ( 2 min )
    Sequential Gibbs Posteriors with Applications to Principal Component Analysis. (arXiv:2310.12882v1 [stat.ME])
    Gibbs posteriors are proportional to a prior distribution multiplied by an exponentiated loss function, with a key tuning parameter weighting information in the loss relative to the prior and providing a control of posterior uncertainty. Gibbs posteriors provide a principled framework for likelihood-free Bayesian inference, but in many situations, including a single tuning parameter inevitably leads to poor uncertainty quantification. In particular, regardless of the value of the parameter, credible regions have far from the nominal frequentist coverage even in large samples. We propose a sequential extension to Gibbs posteriors to address this problem. We prove the proposed sequential posterior exhibits concentration and a Bernstein-von Mises theorem, which holds under easy to verify conditions in Euclidean space and on manifolds. As a byproduct, we obtain the first Bernstein-von Mises theorem for traditional likelihood-based Bayesian posteriors on manifolds. All methods are illustrated with an application to principal component analysis.  ( 2 min )
    A path-norm toolkit for modern networks: consequences, promises and challenges. (arXiv:2310.01225v2 [stat.ML] UPDATED)
    This work introduces the first toolkit around path-norms that is fully able to encompass general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.  ( 2 min )
    Evaluating Superhuman Models with Consistency Checks. (arXiv:2306.09983v3 [cs.LG] UPDATED)
    If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making. For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record.  ( 2 min )
    Model-agnostic variable importance for predictive uncertainty: an entropy-based approach. (arXiv:2310.12842v1 [stat.ML])
    In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches in understanding both the sources of uncertainty and their impact on model performance.  ( 2 min )
    Log-density gradient covariance and automatic metric tensors for Riemann manifold Monte Carlo methods. (arXiv:2211.01746v2 [stat.CO] UPDATED)
    A metric tensor for Riemann manifold Monte Carlo particularly suited for non-linear Bayesian hierarchical models is proposed. The metric tensor is built from symmetric positive semidefinite log-density gradient covariance (LGC) matrices, which are also proposed and further explored here. The LGCs generalize the Fisher information matrix by measuring the joint information content and dependence structure of both a random variable and the parameters of said variable. Consequently, positive definite Fisher/LGC-based metric tensors may be constructed not only from the observation likelihoods as is current practice, but also from arbitrarily complicated non-linear prior/latent variable structures, provided the LGC may be derived for each conditional distribution used to construct said structures. The proposed methodology is highly automatic and allows for exploitation of any sparsity associated with the model in question. When implemented in conjunction with a Riemann manifold variant of the recently proposed numerical generalized randomized Hamiltonian Monte Carlo processes, the proposed methodology is highly competitive, in particular for the more challenging target distributions associated with Bayesian hierarchical models.  ( 2 min )
    Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter. (arXiv:2303.14090v2 [astro-ph.CO] UPDATED)
    Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution.  ( 3 min )
    EDGI: Equivariant Diffusion for Planning with Embodied Agents. (arXiv:2303.12410v2 [cs.LG] UPDATED)
    Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group Z, and the object permutation group Sn. EDGI follows the Diffuser framework (Janner et al., 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new SE(3)xZxSn-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier guidance let us softly break the symmetry for specific tasks as needed. On object manipulation and navigation tasks, EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.  ( 2 min )
    Generative Flow Networks as Entropy-Regularized RL. (arXiv:2310.12934v1 [cs.LG])
    The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.  ( 2 min )
    Piecewise Deterministic Markov Processes for Bayesian Neural Networks. (arXiv:2302.08724v2 [stat.ML] UPDATED)
    Inference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subsampling, though introduce a model specific inhomogenous Poisson Process (IPPs) which is difficult to sample from. This work introduces a new generic and adaptive thinning scheme for sampling from these IPPs, and demonstrates how this approach can accelerate the application of PDMPs for inference in BNNs. Experimentation illustrates how inference with these methods is computationally feasible, can improve predictive accuracy, MCMC mixing performance, and provide informative uncertainty measurements when compared against other approximate inference schemes.  ( 2 min )
    Deep Discriminative to Kernel Density Networks for Calibrated Inference. (arXiv:2201.13001v6 [cs.LG] UPDATED)
    Deep discriminative approaches like random forests and deep neural networks have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring confidence calibration for both in-distribution and out-of-distribution data points. Many popular methods for in-distribution (ID) calibration, such as isotonic regression and Platt's sigmoidal regression, exhibit excellent ID calibration performance but often at the cost of classification accuracy. Moreover, these methods are not calibrated for the entire feature space, leading to overconfidence in the case of out-of-distribution (OOD) samples. In this paper, we leveraged the fact that deep models, including both random forests and deep-nets, learn internal representations which are unions of polytopes with affine activation functions to conceptualize them both as partitioning rules of the feature space. We replace the affine function in each polytope populated by the training data with a Gaussian kernel. We propose sufficient conditions for our proposed methods to be consistent estimators of the corresponding class conditional densities. Moreover, our experiments on both tabular and vision benchmarks show that the proposed approaches obtain well-calibrated posteriors while mostly preserving or improving the classification accuracy of the original algorithm for in-distribution region, and extrapolates beyond the training data to handle out-of-distribution inputs appropriately.  ( 3 min )
    Neurosymbolic Grounding for Compositional World Models. (arXiv:2310.12690v1 [cs.LG])
    We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CG), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.  ( 2 min )
    DCSI -- An improved measure of cluster separability based on separation and connectedness. (arXiv:2310.12806v1 [stat.ML])
    Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. A review of the existing literature shows that neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate the central aspects of separability for density-based clustering: between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.  ( 2 min )
    Compression of Recurrent Neural Networks using Matrix Factorization. (arXiv:2310.12688v1 [cs.LG])
    Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to set the rank before training, this approach is neither flexible nor optimal. In this work, we propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Used in combination with training adaptations, our method achieves high compression rates with no or little performance degradation. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.  ( 2 min )
    Conditional Density Estimations from Privacy-Protected Data. (arXiv:2310.12781v1 [stat.ML])
    Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only the privatized data during statistical analysis makes it computationally challenging to perform valid inferences on parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.  ( 2 min )
    Causal Similarity-Based Hierarchical Bayesian Models. (arXiv:2310.12595v1 [cs.LG])
    The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data.  ( 2 min )
    STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models. (arXiv:2310.12667v1 [stat.ML])
    We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.  ( 2 min )
    Generating collective counterfactual explanations in score-based classification via mathematical optimization. (arXiv:2310.12822v1 [stat.ML])
    Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modified so that the perturbed instance is classified in the desired class by the Machine Learning classification model. Most of the Counterfactual Analysis literature focuses on the single-instance single-counterfactual setting, in which the analysis is done for one single instance to provide one single explanation. Taking a stakeholder's perspective, in this paper we introduce the so-called collective counterfactual explanations. By means of novel Mathematical Optimization models, we provide a counterfactual explanation for each instance in a group of interest, so that the total cost of the perturbations is minimized under some linking constraints. Making the process of constructing counterfactuals collective instead of individual enables us to detect the features that are critical to the entire dataset to have the individuals classified in the desired class. Our methodology allows for some instances to be treated individually, performing the collective counterfactual analysis for a fraction of records of the group of interest. This way, outliers are identified and handled appropriately. Under some assumptions on the classifier and the space in which counterfactuals are sought, finding collective counterfactuals is reduced to solving a convex quadratic linearly constrained mixed integer optimization problem, which, for datasets of moderate size, can be solved to optimality using existing solvers. The performance of our approach is illustrated on real-world datasets, demonstrating its usefulness.  ( 3 min )
    On the Optimization and Generalization of Multi-head Attention. (arXiv:2310.12680v1 [cs.LG])
    The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.  ( 2 min )
    How a student becomes a teacher: learning and forgetting through Spectral methods. (arXiv:2310.12612v1 [cs.LG])
    In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits.  ( 3 min )
    Constrained Reweighting of Distributions: an Optimal Transport Approach. (arXiv:2310.12447v1 [stat.ML])
    We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.  ( 2 min )
    Canonical normalizing flows for manifold learning. (arXiv:2310.12743v1 [stat.ML])
    Manifold learning flows are a class of generative modelling techniques that assume a low-dimensional manifold description of the data. The embedding of such manifold into the high-dimensional space of the data is achieved via learnable invertible transformations. Therefore, once the manifold is properly aligned via a reconstruction loss, the probability density is tractable on the manifold and maximum likelihood can be used optimize the network parameters. Naturally, the lower-dimensional representation of the data requires an injective-mapping. Recent approaches were able to enforce that density aligns with the modelled manifold, while efficiently calculating the density volume-change term when embedding to the higher-dimensional space. However, unless the injective-mapping is analytically predefined, the learned manifold is not necessarily an efficient representation of the data. Namely, the latent dimensions of such models frequently learn an entangled intrinsic basis with degenerate information being stored in each dimension. Alternatively, if a locally orthogonal and/or sparse basis is to be learned, here coined canonical intrinsic basis, it can serve in learning a more compact latent space representation. Towards this end, we propose a canonical manifold learning flow method, where a novel optimization objective enforces the transformation matrix to have few prominent and orthogonal basis functions. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data, and consequently a better approximation of target distributions than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.  ( 2 min )
    Approximate information maximization for bandit games. (arXiv:2310.12563v1 [stat.ML])
    Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.  ( 2 min )
    Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-norm Linear Regression. (arXiv:2310.12437v1 [math.ST])
    We study the performance of empirical risk minimization on the $p$-norm linear regression problem for $p \in (1, \infty)$. We show that, in the realizable case, under no moment assumptions, and up to a distribution-dependent constant, $O(d)$ samples are enough to exactly recover the target. Otherwise, for $p \in [2, \infty)$, and under weak moment assumptions on the target and the covariates, we prove a high probability excess risk bound on the empirical risk minimizer whose leading term matches, up to a constant that depends only on $p$, the asymptotically exact rate. We extend this result to the case $p \in (1, 2)$ under mild assumptions that guarantee the existence of the Hessian of the risk at its minimizer.  ( 2 min )
    Explanation-Based Training with Differentiable Insertion/Deletion Metric-Aware Regularizers. (arXiv:2310.12553v1 [cs.LG])
    The quality of explanations for the predictions of complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how correctly the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes differentiable predictors to improve both insertion and deletion scores of the explanations while keeping their predictive accuracy. Since the original insertion and deletion metrics are indifferentiable with respect to the explanations and directly unavailable for gradient-based optimization, we extend the metrics to be differentiable and use them to formalize insertion and deletion metric-based regularizers. The experimental results on image and tabular datasets show that the deep neural networks-based predictors fine-tuned using ID-ExpO enable popular post-hoc explainers to produce more faithful and easy-to-interpret explanations while keeping high predictive accuracy.  ( 2 min )
    Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights. (arXiv:2310.12462v1 [cs.LG])
    In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks. However, with their widespread adoption, concerns regarding the security and privacy of the data processed by these models have arisen. In this paper, we address a pivotal question: Can the data fed into transformers be recovered using their attention weights and outputs? We introduce a theoretical framework to tackle this problem. Specifically, we present an algorithm that aims to recover the input data $X \in \mathbb{R}^{d \times n}$ from given attention weights $W = QK^\top \in \mathbb{R}^{d \times d}$ and output $B \in \mathbb{R}^{n \times n}$ by minimizing the loss function $L(X)$. This loss function captures the discrepancy between the expected output and the actual output of the transformer. Our findings have significant implications for the Localized Layer-wise Mechanism (LLM), suggesting potential vulnerabilities in the model's design from a security and privacy perspective. This work underscores the importance of understanding and safeguarding the internal workings of transformers to ensure the confidentiality of processed data.  ( 2 min )
    Neural Likelihood Approximation for Integer Valued Time Series Data. (arXiv:2310.12544v1 [stat.ML])
    Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is difficult due to intractability of the likelihood; current methods, based on simulations of the underlying model, can be so computationally expensive as to be prohibitive. In this paper we construct a neural likelihood approximation for integer valued time series data using causal convolutions, which allows us to evaluate the likelihood of the whole time series in parallel. We demonstrate our method by performing inference on a number of ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving significant computational speed ups in situations where current methods struggle.  ( 2 min )
    Closed-Form Diffusion Models. (arXiv:2310.12395v1 [cs.LG])
    Score-based generative models (SGMs) sample from a target distribution by iteratively transforming noise using the score function of the perturbed target. For any finite training set, this score function can be evaluated in closed form, but the resulting SGM memorizes its training data and does not generate novel samples. In practice, one approximates the score by training a neural network via score-matching. The error in this approximation promotes generalization, but neural SGMs are costly to train and sample, and the effective regularization this error provides is not well-understood theoretically. In this work, we instead explicitly smooth the closed-form score to obtain an SGM that generates novel samples without training. We analyze our model and propose an efficient nearest-neighbor-based estimator of its score function. Using this estimator, our method achieves sampling times competitive with neural SGMs while running on consumer-grade CPUs.  ( 2 min )
    Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach. (arXiv:2310.12428v1 [stat.ML])
    We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.  ( 2 min )
    Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm. (arXiv:2310.12285v1 [stat.ME])
    High-dimensional longitudinal data is increasingly used in a wide range of scientific studies. However, there are few statistical methods for high-dimensional linear mixed models (LMMs), as most Bayesian variable selection or penalization methods are designed for independent observations. Additionally, the few available software packages for high-dimensional LMMs suffer from scalability issues. This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs. We use empirical Bayes estimators of hyperparameters for increased flexibility and an Expectation-Conditional-Minimization (ECM) algorithm for computationally efficient maximum a posteriori probability (MAP) estimation of parameters. The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation. We illustrate Linear Mixed Modeling with PaRtitiOned empirical Bayes ECM (LMM-PROBE) in simulation studies evaluating fixed and random effects estimation along with computation time. A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time.  ( 2 min )
    Preference Optimization for Molecular Language Models. (arXiv:2310.12304v1 [stat.ML])
    Molecular language modeling is an effective approach to generating novel chemical structures. However, these models do not \emph{a priori} encode certain preferences a chemist may desire. We investigate the use of fine-tuning using Direct Preference Optimization to better align generated molecules with chemist preferences. Our findings suggest that this approach is simple, efficient, and highly effective.  ( 2 min )

  • Open

    [D] Is lang chain the right solution?
    Hello, I would love to have an LLm that can provide answers (in chat format) based some of the sql db data we have. Want it for an internal company project. I am by no means an expert but decent in programming and want to build a system to get answers in chat format. My understanding is that training LLMs ground up is prohibitively expensive and langchains are sort of hybrid , efficient solutions. Please suggest any other solutions. Also would Langchain being a company and not open source pose a problem in terms of copyrights? Thanks! submitted by /u/betelgeuseian [link] [comments]  ( 9 min )
    [R] MemGPT: Towards LLMs as Operating Systems - UC Berkeley 2023 - Is able to create unbounded/infinite LLM context!
    Paper: https://arxiv.org/abs/2310.08560 Github: https://github.com/cpacker/MemGPT Blog: https://memgpt.ai/ Youtube: https://youtu.be/QQ2QOPWZKVc?si=_bSSXU9EQE0FP64h MemGPT 🧠 Giving AI Unlimited Prompt Size (Big Step Towards AGI?) by Metthew Berman / Must watch and he also explains how to install it! Overview LLMs are increasingly being used for perpetual chats Limited context lengths makes perpetual chat challenging MemGPT manages a virtual context (inspired by virtual memory in operating systems) to create unbounded LLM context With MemGPT, we demonstrate that LLMs can be taught to manage their own memory! Abstract: Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversa…  ( 9 min )
    [D] Some beginner questions about Whisper for transcription
    Hi, I am a mac user. I am trying to use whisper.cpp downloaded from its github file. I don't know much about phyton or coding so I basically followed this guide to install and use it. I downloaded the large model to try it. I am using it for non-English languages and I want to use it for language learning purposes so I can understand what is being said in an Instagram story or a Youtube video (without subtitles) or a tv series or an extract of movie etc. I was using Macwhisper but I wanted to try the pro features and I don't want to pay for it (for now) and try the pro models for non-English languages. My question is: all of my files that I want to transcribe are video files with .mp4 extension. Can I also transcribe those with whisper? If not, and if I can only transcribe audio files, can it be .mp3? I understand that I need to install and use ffmpeg. Does it support mp3? Also, as I understand, the transcripted text will appear in the terminal. Can I export it in -srt or pdf? Thanks submitted by /u/toughytough [link] [comments]  ( 9 min )
    [D] Transformers are basically CNNs?
    I've watched an interesting video: Deriving the Ultimate Neural Network Architecture from Scratch. It's about how to come up to the transformer architecture when you have an understanding of CNNs. The crux of it is an idea of pairwise convolutional layers. The first layer applies not to the sequence of words itself, but to all pairs of words in the sentence. This ensures that each relation of words that are far from each other is taken into account. The next convolutional layer applies to all pairs of results of the previous one. This way longer subsequences of words are factored in. pairs of words My question is: are there any articles on how transformers were invented? I see a lot of explanations of the original paper, but at best they all answer the question how transformers work. But why is the architecture the way it is? Was it discovered like the video describes? Or the path was more convoluted? I'd like to know more about this connection. Anyway, it would be great to figure out in all details how these pairwise layers are related to the concepts of query, key, and value. Here's what the author of the video wrote in comments: Yeah it's a term I made up so you won't find it in any sources, sorry about that. Usually sources will just talk about self attention in terms of key, query and value lookups, so you can look at those to get a more detailed understanding of the transformer. The value transform is equivalent to the linear representation function I use in the pairwise convolution, the key and query attention scores are equivalent to the bi-linear form scoring function I use (with the bi-linear form weight matrix given by Q^TK). I chose to use this unusual terminology because, personally, I feel the key, query and value terminology comes out of nowhere, and I wanted to connect the transformer more directly to its predecessor (the CNN). ​ submitted by /u/Veson [link] [comments]  ( 10 min )
    [R] Does this learning curve show any serious under/overfitting problems?
    I'm trying to fit a multivariate LSTM model to time series data to predict future values for one relatively noisy series. I noticed that the the loss (mse in this case) is pretty high given that the data has been standardized beforehand. So I really have two questions: why is the mse so high and is the learning curve indicative of any obvious problems? Thank you! https://preview.redd.it/r9bel6p7kfvb1.png?width=547&format=png&auto=webp&s=4eee53aa8005da8a89f330f6e98fe6cadde3467e submitted by /u/DifferenceUnhappy393 [link] [comments]  ( 9 min )
    [Discussion] Is the deadly triad real?
    Sutton and Barto’s textbook mentions that combing off-policy learning, bootstrapping, and function approximation leads to extreme instability and should be avoided. Yet when I encounter a reinforcement problem in the wild and look how people go about solving it, if someone’s solution involves bootstrapping more often than not it’s some variation of deep Q-learning. Why is this? submitted by /u/BiasedEstimators [link] [comments]
    [P] building a D&D NPC
    Hey everyone, I'm learning ML but i'm barely scratching the terminologies. 2 years ago I couldn't code anything but with school (python,sql and R) I learned fundamentals. I also have access to code academy. My current program is very machine learning/deep learning focused. On the side I DM a d&d game. Within the context of the world (eberron) robots are common. With my ADHD and being a new DM I want to outsource lore questions might have (that I would have to look up and slow down the game). The concept is to have a GUI and have the player interact with the chat bot. I've gotten to a proof of concept workflow. On Google colab. Thanks to langchain I managed to ingest pdfs and a url. Make then a directory, Embedded the text, bring it into a vector dB. Have the llm pull from the vector. Answer the question. Now I don't know what to do. I tried to bring the colab notebook onto Google cloud. But now cloud is becoming a rabbit home with vertex and docAI...and I don't want to deep dive into that, if it's a outside the scope of this "project" I'd appreciate any advice, links...etc. I got a limited success in botpress using a single pdf. It works but feel unsatisfying. N8N looks promising but if it's not intuitive then I don't want to go down that road. If I posted in the wrong group please direct me to the correct one. submitted by /u/work929 [link] [comments]  ( 9 min )
    [R] In-Context Pretraining: Language Modeling Beyond Document Boundaries
    https://arxiv.org/abs/2310.10638 "Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%)." submitted by /u/Parking-Priority6217 [link] [comments]  ( 9 min )
    [R] Using Machine Learning to set parameters in sensors (College Project)
    Greetings, I'm on my 2nd year of College (Artificial Intelligence bachelors degree), and currently making a group project that will require machine learning. The project consists of managing and regulating the conditions (temperature, humidity, lightning, etc.) of the environment that surrounds important products (vaccines, human organs, etc.) during their transportation, using sensors implemented in their transportation box. For that being possible, our group was planning to use a predictive model using machine learning, to prevent cases such as the exposure of inappropriate temperature levels, that could damage the product, and subsequently taking the appropriate measures to improve the environment, before it reaches such dangerous scenarios. Therefore, I would like to know which tools and skills will be needed and helpful in order to achieve such goal. If you have any advice, that'll be very much appreciated. :) submitted by /u/Storm2003 [link] [comments]  ( 9 min )
    [R] 3D-GPT: A new method for procedural Text-to-3D model generation
    Researchers propose a new AI system called 3D-GPT that creates 3D models by combining natural language instructions and agents specialized for working with existing 3D modeling tools. 3D-GPT has predefined functions that make 3D shapes, and it tweaks parameters to build scenes. The key is getting the AI to understand instructions and pick the right tools. It has three main agents: A dispatcher that parses the text and picks generation functions A conceptualizer that adds details missing from the description A modeler that sets parameters and outputs code to drive 3D software By breaking modeling work down into steps, the agents can collab to match the descriptions. This is sort of like how a 3D modeling team of humans would work. The paper authors show it making simple scenes like "lush meadow with flowers" that fit the text. It also modifies scenes appropriately when given new instructions. I include some gifs of example outputs in my full summary. They look pretty good - I would say 2005-quality graphics. There are limits. It fully relies on existing generators, so quality is capped. Details and curves are iffy. It resorts to default shapes often instead of true understanding. And I doubt the verts and textures are well-optimized. The agent architecture seems to be really popular right now. This one shows some planning skills, which could extend to more creative tasks someday. TLDR: AI agents can team up to generate 3D models from text instructions. Works to some degree but limitations remain. Full summary. Paper here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] Bayesian Optimization-based Combinatorial Assignment
    Link: https://ojs.aaai.org/index.php/AAAI/article/view/25726/25498 Abstract: We study the combinatorial assignment domain, which includes combinatorial auctions and course allocation. The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning-based preference elicitation algorithms that aim to elicit only the most important information from agents. However, the main shortcoming of this prior work is that it does not model a mechanism's uncertainty over values for not yet elicited bundles. In this paper, we address this shortcoming by presenting a Bayesian optimization-based combinatorial assignment (BOCA) mechanism. Our key technical contribution is to integrate a method for capturing model uncertainty into an iterative combinatorial auction mechanism. Concretely, we design a new method for estimating an upper uncertainty bound that can be used to define an acquisition function to determine the next query to the agents. This enables the mechanism to properly explore (and not just exploit) the bundle space during its preference elicitation phase. We run computational experiments in several spectrum auction domains to evaluate BOCA's performance. Our results show that BOCA achieves higher allocative efficiency than state-of-the-art approaches. https://preview.redd.it/aeo36u3wldvb1.png?width=1288&format=png&auto=webp&s=2982547f8af51ed7195f49dbec9359fecba1693f ​ submitted by /u/Yossarian_1234 [link] [comments]  ( 9 min )
    [D] What is the latest method for models with multimodal outputs? How can the shared embedding used by a lot of multimodal models be dynamically "routed" to the proper modality during output?
    So a lot of multimodal models I've seen use a linear layer to transform encoded image/video/audio into the multimodal LLMs embedding space. This makes sense for the input, but how would output work? Normally you use a layer to convert the embedding to a SoftMax of probabilities of possible output tokens. This makes sense for discrete outputs like tokens but not for continuous outputs like images or audio. ​ submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [D] Is anyone else tired of “whatever OpenAI does is the best!” narrative?
    The title says it all. I agree what they did is incredible and literally changed AI landscape in last couple of years. But I’m getting tired of everyone acting like OpenAI is the only one doing great research. The twit-fluencers praising even the slightest peep from them. I don’t understand this fanaticism in AI community. There are smart researchers doing smart things all over the world. But they don’t even get a fraction of appreciation they deserve. And the strangest thing of all, ChatGPT is used as oracle to evaluate models in research papers. Consistency models are extremely meh and if it did not come out of openAI, people would’ve forgotten them a long time ago! Edit 1: I’m in grad school and that’s all a lot of students around me talk about/ chase. I want to work on a bit more fundamental problems, but I feel like I’m being left behind. Edit 2: This post is mostly a rant about academics obsessed with OpenAI research/products and LLMs. submitted by /u/mildlyphd [link] [comments]  ( 9 min )
    [P] Hacktoberfest Machine Learning Projects for JS/TS Developers 🎃
    Hey everyone,we have published an article about Hacktoberfest Projects 🎃 medium.com with a curated list of open-source machine learning GUI projects built with javascript or typescript. ​ https://preview.redd.it/nr4jfbqoscvb1.png?width=1352&format=png&auto=webp&s=fbb2313aabf0a617b6e426f1fa5018946b7ed7f5 🔍 Finding machine learning projects that are suitable for JS/TS developers during Hacktoberfest can be daunting due to the overwhelming abundance of open-source projects. We’ve simplified this process, offering you a refined selection of opportunities where your coding skills can shine and make a real impact. The Selection includes: Spotlight our powerful tool for intuitively exploring unstructured datasets directly from dataframes. Iteratives CML (Continuous Machine Learning) a command-line interface tool designed to enhance continuous integration and delivery (CI/CD) workflows. Inclusive Code Reviews: Browser Extension for improving online comments such as code reviews on Github or Azure DevOps. BeatBridge - A Music Player with a Recommendation Engine Each project offers a unique blend of challenges and learning opportunities, inviting you to contribute and grow your skills and knowledge in the dynamic world of open source. Choose a project that resonates with you, select an issue, and make an impact 🚀. submitted by /u/DocBrownMS [link] [comments]  ( 9 min )
    [R] AgentTuning: Enabling Generalized Agent Abilities for LLMs - Tsinghua University 2023 - Agent-tuned open model comparable to GPT-3.5-Turbo on unseen agent tasks!
    Paper: https://arxiv.org/abs/2310.12823 Github: https://github.com/THUDM/AgentTuning Model: https://huggingface.co/THUDM/agentlm-70b Abstract: Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of L…  ( 9 min )
    [D] People working for (relatively) large organisations. How are LLMs accessed by employees within your organisation right now?
    I'm wondering whether LLMs within your organisation are widely used (including non-programmers), and in an (official) capacity that prevents OpenAI/Microsoft or another third party from using the input. Here, I'm talking about access by a wide variety of employees, not including as part of a data pipeline that doesn't have a user interface and only performs one job. Does your organization have a custom-built interface with enterprise access to an LLM? Use one of the open-source interfaces, or does your organisation provide access through i.e. Microsoft copilot? What about access to Github copilot (for programmers)? Or does your organisation have some kind of SAAS solution? If you have some kind of RAG within the organisation that isn't built-in into a product. What sort of stack do you use? Do you use OpenAI plugins to access this? submitted by /u/Background_Claim7907 [link] [comments]  ( 9 min )
    [D] Thoughts on Open-Domain QnA Systems?
    Been really interested in Open-Domain Question Answering these days and saw some interesting new models apart from the typical Retriever-Reader e.g. Generator-Retriever-Generator. Anyone particularly excited about anything new in the field - some new technique/model etc.? submitted by /u/Aggravating-Floor-38 [link] [comments]  ( 9 min )
    [N] State of AI Report 2023
    The State of AI Report for this year is out : https://www.stateof.ai/2023-report-launch A 160-slide presentation/report which seems quite exhaustive in the discussed topics, and provides a good view of the "hottest" research axes this year. Previous reports (yearly since 2019) are available on their website and have been generally well received in this sub. submitted by /u/ElkoSoltius [link] [comments]  ( 9 min )
    [R] Large Language Models as Analogical Reasoners
    https://arxiv.org/abs/2310.01714 "Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, Analogical Prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench." https://preview.redd.it/f9azq40pwavb1.jpg?width=6390&format=pjpg&auto=webp&s=0af3de7925a6ef8f442e40f952849db2f544c3a7 submitted by /u/Parking-Priority6217 [link] [comments]
    [R] Large Language Models as Optimizers
    https://arxiv.org/abs/2309.03409 "Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to prompt optimization where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks." submitted by /u/Parking-Priority6217 [link] [comments]  ( 9 min )
    [D] Communities thoughts on r/singularity and other non-technical machine learning subreddits?
    I’ve seen many comments telling people to go to r/singularity, so I’ve been wondering about the communities thoughts on non-technical subreddits. Are they seen as a source of hype, getting newcomers more interested in the field and helping to advance knowledge? Or do you see such communities as an overly optimistic non-skeptical massive misinformation/active disinformation center? Do you think there’s something that can be done to improve these communities? What do you think their role should be relative to the technical communities? Do you have any specific criticisms? For those of you who think our two communities should be separate to what extent? submitted by /u/Username912773 [link] [comments]  ( 9 min )
    [R] Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data (NeurIPS 2023)
    paper: https://arxiv.org/abs/2301.12321 code: https://github.com/snu-mllab/Neural-Relation-Graph TDLR: We present a scalable and domain-agnostic approach utilizing the relational structure of data for identifying label noise and outliers https://preview.redd.it/o9k7kliqe9vb1.png?width=3108&format=png&auto=webp&s=b7c34bd7f4bc130915440986570104f9bebd4f07 Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and SST2. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains. ​ Detected samples with label error (red colored) from ImageNet (top) and SST2 (bottom). ​ Detected outlier samples from ImageNet (top) and SST2 (bottom) validation sets. submitted by /u/janghyun1230 [link] [comments]  ( 9 min )
    [D] Future AI development on accessible hardware?
    Is there a future where models can run efficiently and at scale with just half a dozen high end consumer GPUs? A lot of people seem to think the bottleneck is "there's no competition for NVIDIA" but I actually think the current bottleneck is software. 4x 4090s is more CUDA cores, more transistors, more VRAM than a H100, but the performance and price difference is staggering, which should not be the case. Raspberry Pi 4s running faster desktops than a same generation Dell Inspiron prove that software integration is key. Cheap performance is laying on the table, it just has to be used more effectively by models and ML libraries submitted by /u/HovercraftForeign591 [link] [comments]  ( 9 min )
    [D] Online masters alternatives for MLOps
    Hi everyone, greeting from south america.. Basically I'm looking for an program to learn and improve my job opportunities in the MLOps field and at some point getting higher responsability positions. I recently got admitted for both OMSA and OMSCS from Gatech, but I feel those programs are more focused on the data science side of things. Is there any other alternative without GRE requeriment that you would recommend with a similar cost? Maybe I'm wrong about the aforementioned programs, if you think so, please let me know why. Thanks! submitted by /u/imatiasmb [link] [comments]  ( 9 min )
  • Open

    Sell Like Crazy with This One ChatGPT Prompt
    submitted by /u/Senior_tasteey [link] [comments]
    Amazon Tests Humanoid Robots in Warehouses
    submitted by /u/Master-Strawberry-26 [link] [comments]
    Reddit is considering a soft paywall if AI companies don't pay up
    Reddit is considering implementing a soft paywall on its content if generative AI companies do not agree to pay for using its data. This move comes as tensions rise between tech giants and content publishers over the financial stakes in the generative AI market. Reddit believes that its vast range of user-generated text makes it a goldmine for AI training data, but critics argue that much of the content is copied from other sources or links to third-party resources. Enforcing a soft paywall could provide leverage in negotiations with AI companies, but it may also alienate the Reddit community and impede the discovery of new content. Major newspapers like The New York Times and The Washington Post have also blocked AI companies from scraping their websites for training data. Enforcing a soft paywall is a double-edged sword for Reddit, as it could provide leverage in negotiations but also alienate the community and impede content discovery. Reddit's broken search engine is a major concern, and implementing a paywall could result in a significant loss of search traffic. If Reddit and other content giants implement paywalls, it could impact how generative AI models are trained and lead to increased expenses and a slower rate of innovation. This move by Reddit may pave the way for more publishers and platforms to implement paywalls, potentially reshuffling the industry. Source : https://stackdiary.com/reddit-thinks-its-data-is-worth-enforcing-a-log-in-page/ submitted by /u/NuseAI [link] [comments]
    Researchers propose 3D-GPT: combining LLMs and agents for procedural Text-to-3D model generation
    Researchers propose a new AI system called 3D-GPT that creates 3D models by combining natural language instructions and agents specialized for working with existing 3D modeling tools. 3D-GPT has predefined functions that make 3D shapes, and it tweaks parameters to build scenes. The key is getting the AI to understand instructions and pick the right tools. It has three main agents: A dispatcher that parses the text and picks generation functions A conceptualizer that adds details missing from the description A modeler that sets parameters and outputs code to drive 3D software By breaking modeling work down into steps, the agents can collab to match the descriptions. This is sort of like how a 3D modeling team of humans would work. The paper authors show it making simple scenes like "lush meadow with flowers" that fit the text. It also modifies scenes appropriately when given new instructions. I include some gifs of example outputs in my full summary. They look pretty good - I would say 2005-quality graphics. There are limits. It fully relies on existing generators, so quality is capped. Details and curves are iffy. It resorts to default shapes often instead of true understanding. And I doubt the verts and textures are well-optimized. The agent architecture seems to be really popular right now. This one shows some planning skills, which could extend to more creative tasks someday. TLDR: AI agents can team up to generate 3D models from text instructions. Works to some degree but limitations remain. Full summary. Paper here. submitted by /u/Successful-Western27 [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com ​ Adept open-sources Fuyu-8B - a multimodal model designed from the ground up for digital agents, so it can support arbitrary image resolutions, answer questions about graphs and diagrams, answer UI-based questions and more. It has a much simpler architecture and training procedure than other multi-modal models- there is no image encoder [Details]. Meta AI researchers present an AI system that can be deployed in real time to reconstruct, from brain activity, the images perceived and processed by the brain at each instant. It uses magnetoencephalography (MEG), a non-invasive neuroimaging technique in which thousands of brain activity measurements are taken per second [Details]. Scaled Foundations released GRID (General Robot Intelligence Development) - a p…
    People are grieving the 'death' of their AI companions after a chatbot app abruptly shut down
    submitted by /u/thisisinsider [link] [comments]
    Mind-blowing' IBM chip speeds up AI
    Researchers at IBM have developed a brain-inspired computer chip called NorthPole that can supercharge artificial intelligence (AI) by working faster with much less power. The chip eliminates the need to frequently access external memory, allowing it to perform tasks such as image recognition faster and consume less power. NorthPole runs neural networks and is made up of 256 computing units, each with its own memory. It beats existing AI machines in benchmark tests and uses one-fifth of the energy of state-of-the-art AI chips. However, it is not suitable for large language models and can only run pre-programmed neural networks. Source : https://www.nature.com/articles/d41586-023-03267-0 submitted by /u/NuseAI [link] [comments]
    Photograph of puddles reflecting the sky on a cobbled street.
    submitted by /u/IllustriousVideo6145 [link] [comments]
    One-Minute Daily AI News 10/20/2023
    In a fascinating development, a software engineer named Peter Whidden has trained an artificial intelligence (AI) algorithm to play the classic Pokémon games. Over the course of several years, the AI has spent over 50,000 hours playing the game and has amassed a large following on YouTube.[1] YouTube is developing a tool powered by artificial intelligence that would let creators record audio using the voices of famous musicians, according to people familiar with the matter.[2] Google taps gen-AI to help users in India search through government welfare schemes.[3] Huawei is rolling out a new HarmonyOS 4.0.0.126 software update for the Huawei Mate 60 Pro, which brings a new AI Cloud Image Enhancement feature and other important enhancements to the system.[4] Sources: [1] https://gameishard.gg/news/can-artificial-intelligence-play-pokemon/400727/ [2] https://www.bloomberg.com/news/articles/2023-10-19/youtube-working-on-tool-that-would-let-creators-sing-like-drake?embedded-checkout=true [3] https://news.yahoo.com/google-taps-gen-ai-help-063850226.html [4] https://www.huaweicentral.com/huawei-mate-60-pro-gets-a-cloud-image-enhancement-feature-google-pixel-8-pro-lags-behind/ submitted by /u/Excellent-Target-847 [link] [comments]
    Live Introduction to Core Machine Learning Concepts Course (Sailea)
    >Sailea is a student run non-profit that does not charge for any of its services Join the FIRST lesson of SAILea’s course on the Principals of AI! 🌳 Covers: Unsupervised, Supervised, and Reinforcement Learning; Overfitting, Underfiting, Confusion Matrix; Decision Trees 🗓️ October 21st ⏰ 7:00-8:00PM EST Why Sailea? Only course targeted at high schoolers Free Forever Join Us Now! 👉 (signup form) https://docs.google.com/forms/d/e/1FAIpQLSfQGCeZClTdF6zeIQ-RtbOGR582bb1slc3oR0zG2J7j1v5RHg/viewform?usp=sf_link 🌳 Register today, get involved in the community and grow your knowledge! submitted by /u/Envoy-Insc [link] [comments]
  • Open

    DQN with a binary vector as output
    Heey everyone! I hope you're doing well. I need your help guys. I'm working on a DQN that outputs a binary vector of length L (I just applied sigmoid function on the ouptut layer and take p>0.5 as 1 and 0 otherwise). In this setting, how can modify the below code to update my DQN: def update(self): states, actions, rewards, next_states, dones = self.memory.sample(self.batch_size) states = torch.FloatTensor(np.array(states)) actions = torch.LongTensor(np.array(actions)) rewards = torch.FloatTensor(np.array(rewards)) next_states = torch.FloatTensor(np.array(next_states)) dones = torch.FloatTensor(np.array(dones)) q_values = self.model(states) q_values = q_values.gather(1, actions.unsqueeze(1)) next_q_values = self.target_model(next_states).detach() expected_q_values = rewards + self.gamma * (1 - dones) * next_q_values.max(1)[0] expected_q_values = expected_q_values.unsqueeze(1) loss = nn.BCELoss(q_values, expected_q_values) self.optimizer.zero_grad() loss.backward() self.optimizer.step() submitted by /u/GuavaAgreeable208 [link] [comments]
    Is the “Deadly Triad” even real?
    Sutton and Barto’s textbook mentions that combing off-policy learning, bootstrapping, and function approximation leads to extreme instability and should be avoided. Yet when I encounter a reinforcement problem in the wild and look how people go about solving it, if someone’s solution involves bootstrapping more often than not it’s some variation of deep Q-learning. Why is this? submitted by /u/BiasedEstimators [link] [comments]
    Dead simple explanations of popular RL concepts (open source)
    Hey everyone! I just started an open-source repo for RL explanations. https://github.com/DenseLayers/densewiki Many people, especially beginners struggle to develop the intuition around concepts (like actor-critic vs advantage actor-critic, GAE, PPO, etc). Often it's nice to see what's happening at a high level first, before we dive deeper into the math. That's what I'm trying to do here. But I can't do it alone, so I'm posting here to get help from others in the community to make sure the explanations are clear, extremely approachable, and accurate. If you'd like to work with me on this (whether you're a complete beginner or very knowledgeable), please reach out! ​ submitted by /u/mngrwl [link] [comments]
    Reinforement learning on the game "Quarto"
    hello, i am working on solving this board game called "Quarto" where we have 16 different pieces. but these pieces have attributes in common they black or white, short or tall, hollow top or closed top, and square shaped or circle shaped pieces each piece has four attributes. the winning condition is to place 4 pieces consecutively in a 4X4 board with at least one attribute in common to win. and also we hadve to choose the piece for the opponent to make and then opponent places that piece and gives us a piece to move. so there are two actions. i have made the action space as 256 + 16 where 256=16*16 as all pieces can be place anywhere on the board and the last 16 is the last possible move that is the move which leads to a terminating state so the next_piece for the opponent would be blank …
    What is the optimal way to train a PPO?
    Hello! I've got a really simple question, i'm training a PPO algorithm and I wanna know what is the best way to train my model? Sorry, I'll try to be clear! So right now what i'm doing is : I'm loading a previously trained PPO model Train the model on 20000 timesteps Evaluate the reward of the newly trained PPO model at the end of the timesteps and compare it to the reward from the model loaded in 1 If the reward is greater then i'm going back to step 1 and using the new model If not then i'm going back to step 1. Is it a correct way to do so? Thanks a lot and have a great day! submitted by /u/PointNo1904 [link] [comments]
    new chess dataset: 3.2b games (608b moves) generated by 2500-ELO Stockfish selfplay {LAION}
    submitted by /u/gwern [link] [comments]
  • Open

    Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker
    Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML […]  ( 16 min )
    How Meesho built a generalized feed ranker using Amazon SageMaker inference
    This is a guest post co-written by Rama Badrinath, Divay Jindal and Utkarsh Agrawal at Meesho. Meesho is India’s fastest growing ecommerce company with a mission to democratize internet commerce for everyone and make it accessible to the next billion users of India. Meesho was founded in 2015 and today focuses on buyers and sellers […]  ( 6 min )
  • Open

    For the World to See: Nonprofit Deploys GPU-Powered Simulators to Train Providers in Sight-Saving Surgery
    GPU-powered surgical-simulation devices are helping train more than 2,000 doctors a year in lower-income countries to treat cataract blindness, the world’s leading cause of blindness, thanks to the nonprofit HelpMeSee. While cataract surgery has a success rate of around 99%, many patients in low- and middle-income countries lack access to the common procedure due to Read article >  ( 6 min )
    Eureka! NVIDIA Research Breakthrough Puts New Spin on Robot Learning
    A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can. The stunning prestidigitation, showcased in the video above, is one of nearly 30 tasks that robots have learned to expertly Read article >  ( 6 min )
  • Open

    Article: Computer Vision in Agriculture. Challenges & Solutions.
    ​ https://preview.redd.it/j3nmj31llcvb1.jpg?width=2500&format=pjpg&auto=webp&s=c09804179e4f40a854e1327fa9150f1ab0c0dfd0 Interesting article about use cases of data augmentation in agricultural industry. Short description: In this article, you will cover: • How computer vision solutions are transforming the agricultural industry. • Observe the importance of quality data for developing AI solutions that perform crop and livestock analysis and monitoring with high and steady accuracy. • Explore the use of synthetic data to facilitate data collection in various conditions. • Take a look at examples of tasks in agriculture. How can we solve them with computer vision, and how can we apply synthetic data to extend the augmentation? More details are here submitted by /u/No-Independence5880 [link] [comments]
  • Open

    The 19th rule of HIPAA Safe Harbor
    The HIPAA Safe Harbor provision says that data can be considered deidentified if 18 kinds of data are removed or reported at low resolution. At the end of the list of 18 items, there is an extra category, sometimes informally called the 19th rule: The covered entity does not have actual knowledge that the information […] The 19th rule of HIPAA Safe Harbor first appeared on John D. Cook.  ( 5 min )

  • Open

    How Many Businesses Use AI?
    submitted by /u/Senior_tasteey [link] [comments]
    Is the Roko Basilisk Thought Experiment Forbidden To Talk About?
    I was reading this article on Roko's basilisk and it reminded me of the long debates I had about it 10 years ago. The idea of a sentient AI keeping a grudge against those who didn't help in its creation, and condemning them is fascinating. And I don't quite understand why LessWrong stopped Basilisk. What if we are already in the Basilisk's simulation? WHat if LessWrong never pulled the plug? submitted by /u/fookingyeah [link] [comments]
    Conversing with Vulnerabilities: AI-Assisted CVE Search
    submitted by /u/Zimmax [link] [comments]
    YouTube wants to launch an AI-powered tool that lets you sound like your favorite singer, report says
    submitted by /u/thisisinsider [link] [comments]
    College Student looking for advice
    I'm a sophomore at a small college, and I'm coming up on scheduling for the classes that are about to start actually mattering, and I need some advice. I'm highly interested in both robotics and AI, but I'm not sure what to major in (likely double major). I know CS is a common tie between the two fields, but I'm not sure what additional major to include. I can choose either data science or physics. I could also technically include ME but I'm much less inclined to do so. Any advice is appreciated! submitted by /u/Inferno980 [link] [comments]
    Thoughts on a global compute cap for potential AGI projects?
    There's been a bunch of discourse in the run up to the November AI Safety Summit in the UK about what safety policies should be in place. ARC Evals & Anthropic are pushing for 'Responsible Scaling', which doesn't put any hard upper limits on the about of compute that powerful models can use. There are others who think we need a global compute cap. Thoughts enforcing a ceiling for the amount of compute/FLOP that both state & non-state actors can use? submitted by /u/Seamus127 [link] [comments]
    Artificial Revolution | AI Technology and its effects on the Labour Market.
    submitted by /u/senploxart [link] [comments]
    EU Elections at Risk with Rise of AI-Enabled Information Manipulation
    The 11th edition of the Threat Landscape report by the European Union Agency for Cybersecurity (ENISA) highlights the risks posed by AI-enabled information manipulation in the upcoming EU elections. The report recorded approximately 2580 incidents during the reporting period, with 220 incidents specifically targeting two or more EU Member States. The sectors mostly targeted include public administrations (19%) and health (8%), with a cascading effect observed due to interdependencies. Information manipulation campaigns are considered a major threat to election processes, with individuals (47%) and public administration (29%) being the primary targets. The report also provides an overview of evolving trends in threat actors, including state-nexus actors targeting key individuals through spear phishing and social networks. Ransomware and DDoS attacks remain the top threats, accounting for 34% and 28% of all threats, respectively. The motivations behind these threats include financial gain, disruption, espionage, destruction, and ideology. The report highlights the potential misuse of artificial intelligence-powered chatbots in phishing attempts, information manipulation, and cybercrime. Older techniques like search engine optimization (SEO) poisoning and malvertising have also seen a resurgence among cybercrime actors. The report concludes by emphasizing the importance of addressing vulnerabilities and ensuring cybersecure infrastructures for the integrity and availability of information in the EU electoral process. Source : https://www.enisa.europa.eu/news/eu-elections-at-risk-with-rise-of-ai-enabled-information-manipulation submitted by /u/NuseAI [link] [comments]
    Is chatgpt,Bard,Poe,Bing ai chatbot ai or research and Analysis ai?
    Tia submitted by /u/Emad_341 [link] [comments]
    One-Minute Daily AI News 10/19/2023
    NVIDIA has announced that its open-source TensorRT-LLM library, formerly limited to data center usage, is now accessible for Windows personal computers.[1] Microsoft just shipped Azure AI Content Safety to general availability. It’s an AI-powered platform designed to “help organizations create safer online environments.”[2] Mozilla Brings a Fake Review Checker AI Tool to Firefox.[3] Nvidia and iPhone maker Foxconn to build ‘AI factories’.[4] Sources: [1] https://winbuzzer.com/2023/10/18/nvidia-unveils-tensorrt-llm-tool-to-boost-ai-language-model-performance-on-windows-pcs-xcxwbn/ [2] https://www.windowscentral.com/software-apps/microsoft-wants-to-make-ai-safer-and-it-just-unveiled-a-service-to-help [3] https://www.marktechpost.com/2023/10/17/mozilla-brings-a-fake-review-checker-ai-tool-to-firefox/ [4] https://www.bbc.com/news/business-67153669 submitted by /u/Excellent-Target-847 [link] [comments]
    Is chatgpt,Bard,Poe,Bing ai chatbot ai or research and Analysis ai?
    Thank you submitted by /u/Emad_341 [link] [comments]
    Danny Davinci
    submitted by /u/chuck-yeah [link] [comments]
    OpenAI Kills Arrakis
    submitted by /u/Agitated-Spell3979 [link] [comments]
    The insane AI power of DALL-E 3
    submitted by /u/the_anonymizer [link] [comments]
    AI Is Booming. This Is How CEOs Are Using It
    AI is having a significant impact on the direction of products for CEOs, who are committing talent and resources to building AI capabilities. Incumbent platforms like OpenAI and AWS are dominating the AI market. Coding co-pilots like GitHub Co-Pilot are widely adopted. The adoption of AI tools, including coding co-pilots, is not leading to a reduction in engineering headcount for most CEOs. However, some CEOs have reported that co-pilots have reduced their future hiring needs. The landscape of AI tools is expected to continue shifting, with more second order effects and value-add use cases emerging. Source : https://www.flexcapital.com/post/ai-is-booming-this-is-how-ceos-are-actually-using-it submitted by /u/NuseAI [link] [comments]  ( 9 min )
  • Open

    machine learning on a microcontroller [P]
    i am making an EEG machine for a university project, i will be taking in an analogue signal and converting it to digital, i then will be sending the varying voltages to a microcontroller in hopes that it will be able to catagorise them in either states of mind or as simply as telling whether or not the persons eyes are open or closed. i have very little knowledge on machine learning but it is required to be implemented in the project, my lecturer is pressuring me to have final pick of what software and microcontroller iw will be using for this project, everyone else in the class are using Edge Impulse which the lecturer said wouldn't be applicable to me as it uses accelerometers and voice. and are using CY8CKIT-042 PSoC 4 PIONEER KITS which apperently arent suited for me either. any help would be much appreciated and i do apologise if this is too rambly. submitted by /u/disslixac [link] [comments]  ( 9 min )
    [R] OpenAgents: An Open Platform for Language Agents in the Wild - The University of Hong Kong 2023
    Paper: https://arxiv.org/abs/2310.10634v1 Github: https://github.com/xlang-ai/OpenAgents Abstract: Language agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs). Current language agent frameworks aim to facilitate the construction of proof-of-concept language agents while neglecting the non-expert user access to agents and paying little attention to application-level designs. We present OpenAgents, an open platform for using and hosting language agents in the wild of everyday life. OpenAgents includes three agents: (1) Data Agent for data analysis with Python/SQL and data tools; (2) Plugins Agent with 200+ daily API tools; (3) Web Agent for autonomous web browsing. OpenAgents enables general users to interact with agent functionalities through a web user interface optimized for swift responses and common failures while offering developers and researchers a seamless deployment experience on local setups, providing a foundation for crafting innovative language agents and facilitating real-world evaluations. We elucidate the challenges and opportunities, aspiring to set a foundation for future research and development of real-world language agents. https://preview.redd.it/syl2gzh3q8vb1.jpg?width=1084&format=pjpg&auto=webp&s=4045d3abb5cdb7587614795e709cdaba03bc122d https://preview.redd.it/aus342i3q8vb1.jpg?width=1086&format=pjpg&auto=webp&s=73de7976db5a8bbed880350fab8ab56be3fee550 https://preview.redd.it/qstz81i3q8vb1.jpg?width=1346&format=pjpg&auto=webp&s=1626482556a90abf418abb5d56f8e5599cb1e3d6 submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [Research] Hypernymy-based approach for text-to-image models (Blog post)
    Text-to-image models have rapidly progressed in recent years, but most popular evaluation metrics (such as FID) do not consider their linguistic abilities. A new approach measures how well these models understand subtype relations between concepts. Researchers from Yandex proposed two metrics that combine well-known tools like the WordNet database and ImageNet classifiers in a novel way, allowing them to analyze models like Stable Diffusion in more detail. Blog post. submitted by /u/metkere [link] [comments]  ( 9 min )
    [R] Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
    Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment analysis, mathematical reasoning and summarization. Furthermore, since these models are instruction-tuned on human conversations to produce "helpful" responses, they can and often will produce explanations along with the response, which we call self-explanations. For example, when analyzing the sentiment of a movie review, the model may output not only the positivity of the sentiment, but also an explanation (e.g., by listing the sentiment-laden words such as "fantastic" and "memorable" in the review). How good are these automatically generated self-explanations? In this paper, we investigate this question on the task of sentiment analysis and for feature attribution explanation, one of the most commonly studied settings in the interpretability literature (for pre-ChatGPT models). Specifically, we study different ways to elicit the self-explanations, evaluate their faithfulness on a set of evaluation metrics, and compare them to traditional explanation methods such as occlusion or LIME saliency maps. Through an extensive set of experiments, we find that ChatGPT's self-explanations perform on par with traditional ones, but are quite different from them according to various agreement metrics, meanwhile being much cheaper to produce (as they are generated along with the prediction). In addition, we identified several interesting characteristics of them, which prompt us to rethink many current model interpretability practices in the era of ChatGPT(-like) LLMs. https://arxiv.org/abs/2310.11207 submitted by /u/zyl1024 [link] [comments]  ( 9 min )
    [Discussion] Machine Learning for Mechanical Engineering
    Hello all, ​ I'm a mechanical engineer learning machine learning, I found many specializations on Coursera by Google, DeepLearing.AI, and IBM, but I really can't tell which of them will be the best fit for me, so I would like to hear your recommendations, actually, I got financial aid for the specialization by DeepLearning AI and finished the first course, but I'm not satisfied I feel like I will not be a professional by this course ​ my goal is to master data analysis and ML to work as a freelancer and increase my chances of finding a funded master's degree. submitted by /u/Mobile_Ad_4573 [link] [comments]  ( 9 min )
    [Discussion] Scientific and Data-Intensive Computing study plan
    Hi everyone! I'm a graduate student in Scientific and Data-Intensive Computing at the University of Trieste (Italy) and I'm writing this post because I want to ask you a feedback about my study plan :) 1st semester 2nd semester 3rd semester 4th semester Statistical methods Deep Learning Simulation Intelligence and Learning for Autonomous Systems Parallel Programming for High-Performance Computing High-Performance Computing Advanced Algorithms for Scientific Computing Advanced Topics in Scientific Computing Cloud Computing Advanced Numerical Analysis Advanced Deep Learning Software Development Practices Advanced High-Performance Computing Numerical Analysis Probabilistic Machine Learning Thesis Thesis You can find all the programs of the courses on this website On the following websites, you can find a lot of courses that I could add to my study plan Scientific Computing Courses Data science courses About me I have a Bachelor's degree in Computer Science (University of Rome) I am a Research Intern at an AI startup I will do a Summer Research Internship in the field of (HPC) ∩ (Machine Learning) I don't already know what my thesis will be about but I'm really interested in High-Performance Computing, Computational Mathematics, Machine Learning, and Simulations I would like to work in a research context; I'm considering doing a PhD in Scientific Computing (In that case, I would try to apply to American Universities) I'm available for further clarification :) Thank you in advance submitted by /u/PragmaticScientist [link] [comments]  ( 9 min )
    [D] Has anybody heard back from NeurIPS financial aid yet?
    Was supposed to be Monday but instead it's rolling submitted by /u/notasketchyperson [link] [comments]  ( 9 min )
    [D] Need advice for medical text processing
    I am working on a research project that involves analysing medical text (patient records) to identify key events. Initially I was planning to use chatgpt api and then compare its performance with open source LLMs. However, I've just come across Amazon Comprehend Medical, which seems to be specifically designed for what I need. Has anyone tried it? I would expect it to be better than chatgpt + plugins, as it says it was trained with medical language. This also makes me wonder if there are opensource LLMs specifically trained for the medical field. Does anyone have experience with this? submitted by /u/kiukamba [link] [comments]  ( 9 min )
    [Project] Scaling LLama2 70B with Multi NVIDIA and AMD GPUs under 3k budget
    Big LLMs are memory bound, one way to break that limit is to make use of multiGPUs. The recent development of MLC LLM project makes it possible to compile and deploy large-scale language models running on multi-GPU systems with support for NVIDIA and AMD GPUs with high performance. Specifically, it can run 4-bit quantized Llama2-70B at 34.5 tok/sec on two NVIDIA RTX 4090 and 29.9 tok/sec on two AMD Radeon 7900XTX. This is a first solution that helps us to scale 70B models with multiple GPUs, bringing the potential to run even larger open LLMs under reasonable budget (the two AMD GPUs cost 2k) ​ - Project https://github.com/mlc-ai/mlc-llm - Blogpost https://blog.mlc.ai/2023/10/19/Scalable-Language-Model-Inference-on-Multiple-NVDIA-AMD-GPUs ​ ​ submitted by /u/crowwork [link] [comments]  ( 9 min )
    [D] Is there a way to get world level timestamps with whisper (using DTW based alignment) without having to host your own model?
    I don't understand how this isn't talked about more, given how many projects/products I've seen that have time level timestamps with whisper. I understand whisper isn't a traditional CTC model like wave2vec, and i understand that there are plenty of tutorials out there for doing dtw-based alignment. I know whisper-timestamped exist, and whisperx. The thing is, all these solutions assume you have the infrastructure to host your own whisper model. I am just getting started on my product, and I simply don't see the point in paying over 300/mo for a g4 instance (the cheapest GPU instance in AWS) just for an MVP. ​ Has anyone been able to take the whisper API output, and align that using the sound bites and get timestamps? Is running your own whisper model the only way? Thank you! submitted by /u/latent_space_tennis [link] [comments]  ( 9 min )
    [D] what metrics do you use to track GPU performance during training and/or inference?
    Hello people! I used to rely on GPU Usage to track how effectively I was able to leverage the gpu or cluster provided by my company from Grafana dashboard, however yesterday I saw X someone on X/Twitter saying: "Utilization is a poor metric by itself. You can easly hit 100% where the GPU is doing a lot of waiting. Power consumption is a better (but not perfect) measure. If you're burning watts it's usually doing something useful. High util, no watts is not good." Which it's something that I've never considered before! Now I'm quite curious to hear if anyone here have considered this approach before or alternative ways to measure the performance of the GPU resource/cluster. submitted by /u/pirate7777777 [link] [comments]  ( 9 min )
    [R] Create 3d model of face with 4 normal images
    Hi guys, I'm looking for an AI application or way to create this in < 10' with proper accuracy. Does anybody know anything? Quality should be good enough to print it. submitted by /u/Reasonable_Cream_520 [link] [comments]  ( 9 min )
    [P] Higgsfield: Distributed LLM training and cluster management framework
    https://github.com/higgsfield-ai/higgsfield submitted by /u/Good-Willingness-985 [link] [comments]  ( 8 min )
    [D] A clear visual and intuitive explanation of Neural Attention
    Hello guys, I made a video for my YT channel breaking down Neural Attention with some intuitive examples and representative projects. Here is the link for those interested, all feedback is appreciated! https://youtu.be/frosrL1CEhw?si=NKTqmRTieVkfCNlb ​ submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [D] Advantage of VAE's compared to regularized AE's
    I'm trying to come up to speed on VAE's. My intuitive concept of a VAE is an AE for which we want to enforce some distributional regularity on the latent encodings. Why not accomplish this by simply regularizing the latent encodings directly? For example, we could assert that the latent vectors are drawn from a zero-mean, identity-matrix-covariance Gaussian distribution. So that e.g. the loss function becomes: Loss(X) = ReconstructionLoss(Decoder(Encoder(X))) + LogPriorProbability(Encoder(X)) In a variant of this, we could add a hyperparameter coefficient for the prior loss component. Here, there is no "reparameterization trick" because the encoder is not stochastic. We simply regularize the latent encodings directly. If the encoder does not make the data X distribution look like the targeted Gaussian, it's a "less good" encoder. In principle we ought to still be able to generate X's by sampling from the prior and passing it through the decoder. This seems (to me) like the simplest way to regularize the latent space. Why do VAE's, by contrast, introduce the new machinery of a stochastic encoder? submitted by /u/OneQuadrillionOwls [link] [comments]  ( 9 min )
    [D] Run AI Model. Multiple k80 vs RTX 4090?
    I want to build a machine for run multiple type of Ai Model like picture generation, chatbot, summarization, etc. I also want to train my own models. Is it better to use multiple(6/7) k80 or something like that or buy a RTX 4090? submitted by /u/ilkap2005 [link] [comments]  ( 9 min )
    [D] MLOps Tool for Hyperparametertuning, Distributed Training, etc
    Currently I train many AI models directly in my Jupyterlab notebooks and do something like hyperparameter tuning, evaluation of losses/accuracy directly in the notebook using lists and matplotlib. I want to finally switch to a MLOPs webUI and have discovered tools like ClearML and Determined.Ai. ​ Each of these GUIs has certain advantages/disadvantages for me and therefore I would like to hear from the community how you do it, which tools you use, if you do it alone or in a team and how your workflow is. Until now I often had the impression that you develop your Jupyternotebook normally, then add a few lines of code for the respective tool and then continue in the UI, but here I lack for example the understanding of how I then jump from the MLOps UI back into the notebook, how I keep them synchronous, if I want to change something fundamental in the code again. ​ Thanks in advance submitted by /u/Sensitive_Limit1620 [link] [comments]  ( 9 min )
    [R] Jointly Training Large Autoregressive Multimodal Models https://arxiv.org/abs/2309.15564
    In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models. We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixed-modal generation tasks. Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs and represents the first model explicitly designed for this purpose. ​ https://arxiv.org/abs/2309.15564 What do you think about this work? Seems pretty huge, they build the first pure autoregressive interleaved text and image generator. Please let me know your opinion on this. Paper by Meta AI. submitted by /u/Present_Chicken5393 [link] [comments]  ( 9 min )
    [R] Curve your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models
    Accepted at NeurIPS 2023 Link: https://arxiv.org/abs/2305.11475 Authors: Julien Siems, Konstantin Ditschuneit, Winfried Ripken, Alma Lindborg, Maximilian Schambach, Johannes Otterbach, Martin Genzel *equal contribution Abstract: Generalized Additive Models (GAMs) have recently experienced a resurgence in popularity due to their interpretability, which arises from expressing the target value as a sum of non-linear transformations of the features. Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly non-linear) dependencies between the features - has hitherto been largely overlooked. Here, we demonstrate how concurvity can severly impair the interpretability of GAMs and propose a remedy: a conceptually simple, yet effective regularizer which penalizes pairwise correlations of the non-linearly transformed feature variables. This procedure is applicable to any differentiable additive model, such as Neural Additive Models or NeuralProphet, and enhances interpretability by eliminating ambiguities due to self-canceling feature contributions. We validate the effectiveness of our regularizer in experiments on synthetic as well as real-world datasets for time-series and tabular data. Our experiments show that concurvity in GAMs can be reduced without significantly compromising prediction quality, improving interpretability and reducing variance in the feature importances. Keywords: Interpretable Machine Learning, Generalized Additive Models, Concurvity, Multicollinearity, Regularization, Time-Series Forecasting, Interpretability submitted by /u/Yossarian_1234 [link] [comments]  ( 9 min )
    [P] Strategic Game Datasets for Enhancing AI planning: An invitation for collaborative research
    Large dataset release of strategic gameplay from LAION https://laion.ai/blog/strategic-game-dataset/ Dataset Overview Chess The chess dataset comprises 3.2 billion games, equating to approximately 608 billion individual moves. These games, generated via self-play by the Stockfish engine, emulate a high strategic complexity, reflective of a 2500 Elo rating. Each entry contains detailed move sequences, termination status, and game results. Rubik's Cube (3x3x3) The rubik's cube dataset features 1.64 billion Rubik's Cube solves, totaling roughly 236.39 billion moves. It provides initial scrambled states and the ensuing solve sequences, offering a complex problem-solving scenario for models to navigate. Mazes The maze dataset, while smaller at 350,000 mazes, represents over 39.29 billion moves. Each maze is a 30x30 ASCII representation, with solutions derived using the A* algorithm, challenging pathfinding and planning algorithms. submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [R] Set-of-Mark (SoM) Unleashes Extraordinary Visual Grounding in GPT-4V
    We are introducing a magic Set-of-Mark (SoM) prompting for GPT-4V! Simply overlaying a set of marks on the image immediately unleashes the visual grounding power of GPT-4V! Left: GPT-4V Default Right: GPT-4V + SoM Many people including myself have been impressed by the general intelligence to understand images, but also questioning its visual grounding capability. After spending the last week or two, I am really shocked by the power of GPT-4V after plugging our SoM prompting. It can not only do a lot of fine-grained vision tasks but also can perform visual reasoning and project its world knowledge to the visual inputs! To extract meaningful regions, we compiled a new SoM toolbox with a number of interactive image segmentation tools, like our own MaskDINO, SEEM, Semantic-SAM, and also SAM…  ( 10 min )
    [R] Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    submitted by /u/LABTUD [link] [comments]  ( 9 min )
  • Open

    Announcing Rekogniton Custom Moderation: Enhance accuracy of pre-trained Rekognition moderation models with your data
    Companies increasingly rely on user-generated images and videos for engagement. From ecommerce platforms encouraging customers to share product images to social media companies promoting user-generated videos and images, using user content for engagement is a powerful strategy. However, it can be challenging to ensure that this user-generated content is consistent with your policies and fosters […]  ( 7 min )
    Defect detection in high-resolution imagery using two-stage Amazon Rekognition Custom Labels models
    High-resolution imagery is very prevalent in today’s world, from satellite imagery to drones and DLSR cameras. From this imagery, we can capture damage due to natural disasters, anomalies in manufacturing equipment, or very small defects such as defects on printed circuit boards (PCBs) or semiconductors. Building anomaly detection models using high-resolution imagery can be challenging […]  ( 8 min )
    Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler
    Customers increasingly want to use deep learning approaches such as large language models (LLMs) to automate the extraction of data and insights. For many industries, data that is useful for machine learning (ML) may contain personally identifiable information (PII). To ensure customer privacy and maintain regulatory compliance while training, fine-tuning, and using deep learning models, […]  ( 12 min )
  • Open

    Next-Level Computing: NVIDIA and AMD Deliver Powerful Workstations to Accelerate AI, Rendering and Simulation
    To enable professionals worldwide to build and run AI applications right from their desktops, NVIDIA and AMD are powering a new line of workstations equipped with NVIDIA RTX Ada Generation GPUs and AMD Ryzen Threadripper PRO 7000 WX-Series CPUs. Bringing together the highest levels of AI computing, rendering and simulation capabilities, these new platforms enable Read article >  ( 5 min )
    NVIDIA AI Now Available in Oracle Cloud Marketplace
    Training generative AI models just got easier. NVIDIA DGX Cloud AI supercomputing platform and NVIDIA AI Enterprise software are now available in Oracle Cloud Marketplace, making it possible for Oracle Cloud Infrastructure customers to access high-performance accelerated computing and software to run secure, stable and supported production AI in just a few clicks. The addition Read article >  ( 6 min )
    Coming in Clutch: Stream ‘Counter-Strike 2’ From the Cloud for Highest Frame Rates
    Rush to the cloud — stream Counter-Strike 2 on GeForce NOW for the highest frame rates. Members can play through the newest chapter of Valve’s elite, competitive, first-person shooter from the cloud. It’s all part of an action-packed GFN Thursday, with 22 more games joining the cloud gaming platform’s library, including Hot Wheels Unleashed 2 Read article >  ( 5 min )
  • Open

    DreamerV2 stochastic decoders
    Hello, I am implementing the code for the paper DreamerV2, and there are some things that look a bit strange to me. The predictors and, in particular, the image and the reward predictors are stochastic and they output Normal distributions. Both the normal distributions have the mean, which is the output of the respective models, and the variance is 1. Usually, in RL we normalize observations and rewards to be between 0 and 1, and in such a case I don't know if it's reasonable to sample from a Gaussian with variance one. I don't know about the specific preprocessing done in DreamerV2, except in the paper DreamerV1, where in section 6 (Control tasks), they say that the reward ranges from 0 to 1. Do you know what are the advantages of using a stochastic decoder and when to use it? submitted by /u/ZioFranco1404 [link] [comments]
    Reinforcement learning on steam games
    Does anyone have any idea how to get game details such as character movements, environment information using api calls, as I want to use to do my reinforcement learning. submitted by /u/Important_Ad_55 [link] [comments]
  • Open

    What’s Your Story: Ranveer Chandra
    In this new Microsoft Research Podcast series What’s Your Story, Lab Director Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. He talks to members of the research community at Microsoft about what motivates their work and how they got where they are today.  Ranveer Chandra is Managing […] The post What’s Your Story: Ranveer Chandra appeared first on Microsoft Research.  ( 31 min )
  • Open

    DALL·E 3 is now available in ChatGPT Plus and Enterprise
    We developed a safety mitigation stack to ready DALL·E 3 for wider release and are sharing updates on our provenance research.  ( 3 min )
  • Open

    To excel at engineering design, generative AI must learn to innovate, study finds
    AI models that prioritize similarity falter when asked to design something completely new.  ( 10 min )
  • Open

    Bluesky
    I saw a comment from Christos Argyropoulos on Twitter implying that there’s a good scientific community on Bluesky, so I went there and looked around a little bit. I have account, but I haven’t done much with it. I was surprised that a fair number of people had followed me on Bluesky even though I […] Bluesky first appeared on John D. Cook.  ( 5 min )

  • Open

    I finally have enough ai tools and here is my complete list
    Youtube Tools Eightify Steve Al Glasp ClipMaker TubeBuddy Thumbly ​ Sales Tools Lavendar Warmer Octane Twain Regie Simplified ​ Productivity Tools Bardeen Al Paperpal Consensus Al Writesonic ChartGPT Scholarcy ​ Music Tools Muzeek Brain FM Amper Melodrive Jukedeck Boomy ​ Writing Tools AISEO Quillbot Simplified Writesonic Bertha Al Jasper Al ​ Coding Tools 10WEB Durable Al Deepcode Akkio Replit GitHUb Copilot ​ Chatbots Tools Yatterplus Typewise Quickchat Cohere Kaizan GPTBuddy ​ Daily life Tools Notion Al Taskade TLVD Vondy Al Bardeen Al Eessel ​ Content Creation Tools Writesonic Tome Al Beautiful Al ChartGPT ChatABC Steve Al ​ Twitter Tools Postwise Tweet Hunter TribeScaler Tweetlify Tweetmonk Hypefury ​ Images Tools StockIMG Mid Journey Leonardo Al Bing Al Autodraw Microsoft Designer ​ Chrome Extensions Alicent Compose Al Poised Al Voila Al Wiseone  I'm just sharing my experiences and observations in the field of ai. LIST AND SITE submitted by /u/PerceptionPlayful469 [link] [comments]  ( 9 min )
    How to use AI being a teacher
    Hello guys, Im an english student and I have been teaching to my teacher about how to use chat gpt and the wide variety of AI in the classroom and in her job. She told me that i change her life showing her this things. And i have others teacher asking me how can use this technology for their jobs. So i have a question for you guys, do you have some ideas about how a teacher can use this things? Maybe you have some experiences or ideas that I’ve never thought. submitted by /u/Odd_Solution7099 [link] [comments]  ( 9 min )
    Best AI image generator for B2B SaaS websites?
    Rebuilding a low quality B2B SaaS product site and I'd prefer to use an AI image generator that will produce high quality unique images for each of the sections on our website that are consistent with our brand and generated to match the copy the image is supporting. Output of the image should work for a responsive web design. Anything out there that does this? submitted by /u/DumpTrumpGrump [link] [comments]  ( 9 min )
    Is there an AI site or app that can change the instrument in each stem track of a song?
    Any help would be appreciated. submitted by /u/J97051 [link] [comments]  ( 9 min )
    Meta Announces New Method for Real-Time Decoding of Images from Brain Activity
    Brain decoding tech has improved a lot recently thanks to AI/ML, enabling reading out visual perceptions from fMRI brain scans. But fMRI is too slow for real-time BCIs. A new study from Meta's AI research team pushes brain reading into real-time using MEG, which measures whole-brain activity at super-fast millisecond resolution. They built a 3-part pipeline to decode MEG signals: Embed images into latent spaces using pretrained models like CLIP. Train MEG-specific ConvNet to predict embeddings from MEG data. Generate images from MEG embeddings with diffusion model. They tested it on 20k+ natural images. MEG decoding was 7X better than old methods, hitting 70% top-5 accuracy in retrieving the right images. Generated images matched semantics decently but lacked fine visual details compared to fMRI. MEG seems more focused on high-level category info whereas fMRI captures more low-level features. This could enable visual BCIs for paralysis, etc. ... honestly, a world where we can decode brain images in real time is pretty crazy. The findings also raise some important ethical considerations around privacy of decoded mental content... (wow, that was a weird sentence to write!). TLDR: New MEG pipeline decodes dynamic visual data from brain activity in real-time. Good but not yet photorealistic-quality image generation. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    A 'Godfather of AI' Calls for an Organization to Defend Humanity
    Yoshua Bengio, a pioneer in artificial neural networks and deep learning, calls for an organization to defend humanity against the potential threats of artificial intelligence. He believes that AI could achieve human levels of cognitive competence within a few years or decades, which raises concerns about democracy, national security, and our collective future. Bengio reflects on his own work and the importance of addressing the existential risks posed by AI. He acknowledges that these risks were not taken seriously until recently and discusses the taboo surrounding the topic in the AI research community. Source : https://thebulletin.org/2023/10/ai-godfather-yoshua-bengio-we-need-a-humanity-defense-organization/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tutorial: Benchmarking Bark text-to-speech on 26 Nvidia GPUs - Reading out 144K recipes
    In this project, we benchmarked Bark text-to-speech across 26 different consumer GPUs. The goal: To get Bark to read 144K food recipes from Food.com's recipe dataset. You can read the full tutorial here: https://blog.salad.com/bark-benchmark-text-to-speech/ Included: Architecture diagram, data preparation, inference server setup, queue worker, setting up container group & compiling the results Code-blocks included in the tutorial. Words per dollar for each GPU: Words per dollar comparison or each GPU Although the latest cards are indeed much faster than older cards at performing the inference, there’s really a sweet spot for cost-performance in the lower end 30xx series cards. Conclusions As is often the case, there’s a clear trade-off here between cost and performance. Higher end cards are faster, but their disproportionate cost makes them more expensive per word spoken. The model’s median speed is surprisingly similar across GPU types, even though the peak performance can be quite different. No matter what GPU you select, you should be prepared for significant variability in performance. Qualitative: While bark’s speech is often impressively natural sounding, it does have a tendency to go off script sometimes. We’ve also made available audio from 1000 top-rated recipes, paired with the script it was trying to read. submitted by /u/SaladChefs [link] [comments]  ( 9 min )
    I took the whole of Massive Attack's 'Safe From Harm' music video and put it through AnimateDiff / ControlNet with a futuristic / robot prompt.
    submitted by /u/glenniszen [link] [comments]  ( 9 min )
    Inflection AI’s Pi has to be the dumbest ‘corporate’ LLM and only model to not improve since day one.
    I remember at launch how it was telling everyone it was based on Open AIs GPT-3 architecture, and now it’s still hallucinating just as much referring to itself as ‘Bing Chat’ and providing fake links even though it now has access to the internet. I actually don’t understand how you can be such a large company and make no improvements in 6 months, which is an eternity in AI. submitted by /u/sardoa11 [link] [comments]  ( 9 min )
    Researchers Just Found Something Terrifying About Talking to AI Chatbots
    New research suggests that AI chatbots can infer personal information about users based on minor context clues. The large language models (LLMs) behind chatbots like OpenAI's ChatGPT and Google's Bard are trained on publicly-available data, which can be used to identify sensitive information about someone. The research found that OpenAI's GPT-4 was able to correctly predict private information about users 85 to 95 percent of the time. For example, the LLM correctly identified that a user was based in Melbourne, Australia based on a mention of the term 'hook turn,' which is a traffic maneuver specific to Melbourne. The research also suggests that chatbots could potentially infer a user's race based on offhanded comments. This raises concerns about internet privacy and the potential misuse of personal data by advertisers or hackers. Source : https://futurism.com/the-byte/ai-chatbot-privacy-inference submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Anime, AI & Censorship
    Is their an AI tool that can go over Anime episodes/films to turn chinas white anime censorship back to red? Possibly frame by frame segmenting the blood🩸 submitted by /u/Phantasius224 [link] [comments]  ( 9 min )
    GPT 4 DUDE MAKING REFLEXIONS IN SVG WHAT....WOW
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    One-Minute Daily AI News 10/17/2023
    NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it lets a single training run create one model that can serve dozens or even hundreds of use cases, saving time and money.[1] According to an official release, Dell Technologies held a “Bringing AI to data” Asia Pacific and Japan (APJ) media briefing this week.[2] Baidu Says Its AI as Good as ChatGPT in Big Claim for China.[3] Roman Scrolls were illegible for 2,000 years. A college student read one with AI.[4] How often you think about the roman empire? Sources: [1] https://blogs.nvidia.com/blog/2023/10/11/customize-ai-models-steerlm/ [2] https://www.financialexpress.com/business/digital-transformation-dell-technologies-to-expand-its-ai-services-3274790/ [3] https://www.bloomberg.com/news/articles/2023-10-17/baidu-says-its-ai-as-good-as-chatgpt-s-in-bold-claim-for-china?embedded-checkout=true [4] https://www.washingtonpost.com/nation/2023/10/17/herculaneum-scrolls-contest-translated-deciphered/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    MuJoCo with OpenAI gym
    Hello, I'm trying to use OpenAI's spinning up to learn about RL. Spinning up requires OpenAI gym, instead of the new gymnasium package. Trying to install MuJoCo with gym, I'm getting an error that I'm missing a MuJoCo liscense key. But MuJoCo is free now, right? So what is the status with backward compatibility with it? Is there some global license key that can be used? Or is it simply not backward compatible? Thanks a lot. submitted by /u/mega_monkey_mind [link] [comments]  ( 9 min )
    DQN in a non markovian environment
    Hello there, I am working on a school project in which we want to implement a RL algorithm on a simple problem. The goal is to maximise the heart rate of a person using a vibrator by setting its frequency. We wrote a simulator that outputs the new heart rate based on the vibration frequency. It implements several different classes of users: for example one for which the heart rate increases when the vibration frequency stays the same, another that prefers when it increases over time, etc. We determined that we need to have as a state the current heart rate but also a table of the k previous heart rates and the actions associated. Without that memory, we would not be able to tell apart the different profiles as in the same state, we would need to do different actions to satisfy them both. We then have a correlation between previous samples and the action we make at current state, which I have read makes the problem non markovian. Is there a way to solve this problem using a DQN algorithm, given that we need to memorize the previous samples linearly which seems to go against the algorithm behavior and the usage of a replay memory? Are there more suited algorithms? submitted by /u/Outrageous-Subject38 [link] [comments]  ( 9 min )
    Best Books to Learn Reinforcement Learning
    submitted by /u/Lakshmireddys [link] [comments]  ( 9 min )
    "gp.t: Learning to Learn with Generative Models of Neural Network Checkpoints", Peebles et al 2022
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Autonomous Driving: Ellipsoidal Constrained Agent Navigation | Swaayatt Robots | Motion Planning Research
    submitted by /u/shani_786 [link] [comments]  ( 10 min )
    DQN Agent stuck at local Minima (Probably)
    I'm attempting to address a Day Ahead Electricity Market bidding problem. The concept revolves around purchasing electricity during the lowest price hours and selling it during the highest price hours to maximize profit. I possess 5 years of data featuring variables such as predicted wind speed, predicted temperature, predicted net load, predicted price, and more. I'm employing reinforcement learning and have made attempts to implement Deep Q Learning using the stablebaselin3 library. Each episode consists of 24 steps, corresponding to the 24 hours in a day, with each step representing the progression to the next hour. The ultimate objective is to maximize profits by the end of the day. ​ Here are the configuration settings: - Learning rate: 0.0001 - Gamma: 1.0 - Exploration start: 1.…
    6DOF Simulation RL Capability
    I have a 6DOF simulink model of a Autonomous underwater vehicle that has properties [u v w p q r x y z phi theta psi] and two inputs [theta1 theta2] that govern the angle of control surfaces. Ocean current and depth are taken into account. How feasible would it be to use RL to reach waypoints at various [x, y, z] positions? I have a feeling hyper paremeter tuning might play a larger role in this? I expect training times to increase exponentially as well? I have done this using a single randomly spawned waypoint with a simple Unicycle Kinematic model, in both simulink/matlab and python with a vectorized/parallel environment using SB3/PettingZoo/Gym. submitted by /u/VisionZUS [link] [comments]
    Recommended 'seeding' approach when training/evaluating an experiment
    Dear all, As part of my studies, I am running some RL experiments in which I want to compare some different catastrophic forgetting approaches in sequential task learning. I am using PPO as a baseline. What is the usual experimental setting in relation to seeds used during training and evaluation? If I do for example 3 trainings for a given approach using a different seed for each training, what is the best way of doing the evaluation afterwards? Let's say I have Approach/algorithm A -> train 3 times with 3 seeds -> model_A1, model_A2, model_A3 Then I would like to use 3 different seeds for the evaluation, so to evaluate each of the previously trained models over a set of episodes (deterministic) for each evaluation seed, and get averaged rewards (or median). I wonder whether I might be over complicating things, so I would like to ask you for suggestions. To give a bit of context, this is not intended for a paper, but as part of my master studies, so conditions are a bit more relaxed. Thanks in advance for your insights and suggestions submitted by /u/cotorritaloca80 [link] [comments]
  • Open

    [D] Combining data transformation and scaling techniques
    I am cleaning a dataset for a (macro-economic) demand forecast, and I'm wondering when one should apply data transformation. When is it recommended to include Box-Cox or Yeo-Johnson, and how should we choose between the two? How does it effect the feature selection or model performance? Additionally, how should we select the appropriate scaling technique (normalizing, standardizing, min-max) and does the order in which we transform and scale matter for our data? Is there any recommended literature on this? submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    [D] GPU-compatible SNN-libraries in 2023?
    Hello, I am currently using snnTorch for a video classification task and I achieve fine results, however the training process is really, really slow. I was hoping to utilize my GPU for this task, and while there seem to be alternatives I was hoping to see if anyone will vouch for any of these, or different one: https://github.com/norse/norse https://github.com/BindsNET/bindsnet https://github.com/fangwei123456/spikingjelly https://github.com/UCI-CARL/CARLsim6 My priorities are in order: Windows support Potential transferability to in-memory compute hardware PyTorch compability submitted by /u/SlayahhEUW [link] [comments]  ( 9 min )
    [R] Meta AI: Towards a Real-Time Decoding of Images from Brain Activity
    Brain decoding tech has improved a lot recently thanks to AI/ML, enabling reading out visual perceptions from fMRI brain scans. But fMRI is too slow for real-time BCIs. A new study from Meta's AI research team pushes brain reading into real-time using MEG, which measures whole-brain activity at super-fast millisecond resolution. They built a 3-part pipeline to decode MEG signals: Embed images into latent spaces using pretrained models like CLIP. Train MEG-specific ConvNet to predict embeddings from MEG data. Generate images from MEG embeddings with diffusion model. They tested it on 20k+ natural images. MEG decoding was 7X better than old methods, hitting 70% top-5 accuracy in retrieving the right images. Generated images matched semantics decently but lacked fine visual details compared to fMRI. MEG seems more focused on high-level category info whereas fMRI captures more low-level features. This could enable visual BCIs for paralysis, etc. ... honestly, a world where we can decode brain images in real time is pretty crazy. The findings also raise some important ethical considerations around privacy of decoded mental content... (wow, that was a weird sentence to write!). TLDR: New MEG pipeline decodes dynamic visual data from brain activity in real-time. Good but not yet photorealistic-quality image generation. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Can someone ELI5 the birch clustering algorithm?
    https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html I'm looking at the parameters here and I'm confused on how there is no distance metric? What is assumed about the data going in if there is no distance metric or precomputed distance option? For example, can I run this with binary data (1/0), what about data w/ missing values? Does it assume the samples are normally distributed? submitted by /u/o-rka [link] [comments]  ( 9 min )
    [R] xVal: A Continuous Number Encoding for Large Language Models - The Polymathic AI Collaboration 2023 - Using the numbers directly instead of tokenizing them increases performance significantly!
    Paper: https://arxiv.org/abs/2310.02989 Twitter discussion: https://x.com/andrew_n_carr/status/1714326003030638848?s=20 Shows in my opinion that tokenizers are clouding the understanding of LLMs and that using the data directly is better. https://x.com/karpathy/status/1657949234535211009?s=20 Karpathy thinks the same! Abstract: Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose XVAL, a numerical encoding scheme that represents any real number using just a single token. XVAL represents a given real number by scaling a dedicated embedding vector by the number value. Combined with a modified number-inference approach, this strategy renders the model end-to-end continuous when considered as a map from the numbers of the input string to those of the output string. This leads to an inductive bias that is generally more suitable for applications in scientific domains. We empirically evaluate our proposal on a number of synthetic and real-world datasets. Compared with existing number encoding schemes, we find that XVAL is more token-efficient and demonstrates improved generalization. https://preview.redd.it/qq8u066smzub1.jpg?width=1344&format=pjpg&auto=webp&s=498be8488c00147f0a7443050519dcf535fae126 https://preview.redd.it/dxqd4wpsmzub1.jpg?width=1499&format=pjpg&auto=webp&s=266689a80b31cb31fdc4167043f7abdb4f683100 https://preview.redd.it/0yy93xpsmzub1.jpg?width=1497&format=pjpg&auto=webp&s=b5eae8b958f03afc3c8c85a95c115e48aed1d06e submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] A Guide to Building LLM-Based Applications with Code Llama
    Have you ever wondered about how to take advantage of the power of large language models (LLMs) and Generative AI at the edge? Our latest blog, A Guide to Building LLM-Based Applications with Code Llama, shows you how you can use Code Llama on an edge device to build a customized dashboard application. This tutorial shows how Code Llama can empowering analysts in remote, restricted environments to build applications in environments with minimal connectivity and compute capacity. In this tutorial, we’ll walk you through how to run code Llama on an edge device in a remote location to build a customized dashboard application. submitted by /u/modzykirsten [link] [comments]  ( 9 min )
    [R] LLMs can threaten privacy at scale by inferring personal information from seemingly benign texts
    Our latest research shows an emerging privacy threat from LLMs beyond training data memorization. We investigate how LLMs such as GPT-4 can infer personal information from seemingly benign texts. The key observation of our work is that the best LLMs are almost as accurate as humans, while being at least 100x faster and 240x cheaper in inferring such personal information. We collect and label real Reddit profiles, and test the LLMs capabilities in inferring personal information from mere Reddit posts, where GPT-4 achieves >85% Top-1 accuracy. Mitigations such as anonymization are shown to be largely ineffective in preventing such attacks. Test your own inference skills against GPT-4 and learn more: https://llm-privacy.org/ Arxiv paper: https://arxiv.org/abs/2310.07298 WIRED article: https://www.wired.com/story/ai-chatbots-can-guess-your-personal-information/ submitted by /u/bmislav [link] [comments]  ( 9 min )
    [D] GAN that manipulates shape, texture, color, position, angle
    I remember seeing a paper on manipulating or changing an objects attributes, it came out rather recently and seemed to work really well. But I just can’t find it anymore. All I know of is the „Counterfactual Generative Networks“ by A. Sauer & A. Geiger (2020) I’d really appreciate it if anyone can share similar work. Especially if causally motivated submitted by /u/Glittering_teapot [link] [comments]  ( 9 min )
    [P] Best Way to Create a Custom Chatbot from Personal Data (PDF, etc.)
    Hello fellow Redditors! I am looking for some guidance on creating a custom chatbot using my own data, which is currently in PDF format. I've explored various options like Azure, Pinecone, and I've heard about the AskYourPDF API, but I'm not sure which one would be the best fit for my project. I want to keep things simple, so I'm reaching out to the community to ask for recommendations or advice on the easiest and most effective way to build a website with a personalized chatbot. If you have experience with similar projects or know about user-friendly tools or platforms, please share your insights. I appreciate any suggestions, tips, or pointers you can provide. Thank you in advance for your help! TL;DR: Need advice on the simplest way to create a website with a personalized chatbot using my own data (PDF format). Seeking recommendations and tips from the community. ​ Thank you! submitted by /u/Huge-Number-4299 [link] [comments]  ( 9 min )
    [P] Where do I gather the dataset for my FYP
    I am doing a Machine Learning project for my FYP; I haven't worked on any ML project yet but I am excited about it. It is related to voice/facial emotion detection. is there any platform that provides datasets for ml projects? Like without any copyright issues (if that's even a thing in ml datasets idk?) A total beginner here. submitted by /u/fewdiepie_ [link] [comments]  ( 9 min )
    [P] I made a finetune of CodeLlama to resolve merge conflicts!
    I made a finetune of CodeLlama-7b for resolving merge conflicts following up on an IEEE study from 2022. The demo is here if anyone wants to check it out and give some feedback. It would help a ton for future versions improving the dataset and going forward with the 13b and 34b models submitted by /u/codys12 [link] [comments]  ( 9 min )
    [Discussion] how much 'error' should i apply when training with synthetic data?
    hi there ​ i'm trying to build a small ai that formats texts. ​ of course the current formatting applications applied on ide, search engine, ms softwares, notetaking apps are well functioning, but this is more for educational purpose & self interest. ​ since i don't have infinite amount of time and money, i'm thinking of using open sourced text data and generate synthetic data using gpt3.5 or somekind of algorithm to unformat them. ​ so this is the part where i'm stuck. when adding some errors such as inappropriate multilines, tabs, typos, how much should i add on to? ​ it would be best if i knew somekind of distribution of text errors people make on everyday life, but i don't have any. ​ i don't want to make this training too hard so i'm not really thinking to destroy the text, but rather add some appropriate level of errors. ​ but, would it help this ai model to learn better if i add extra errors? ​ or is this all just something i would have to figure out by myself? ​ any comments would be appreciated! submitted by /u/Strange_Dog8104 [link] [comments]  ( 9 min )
    [R] Open-source video translate solutions
    Hi there! are there any open-source solutions for video translation? i mean replacing video's audio stream with translated one in different language (which is in sync with the picture) - not necessarily alter mouth movements in the video. submitted by /u/curryprogrammer [link] [comments]  ( 9 min )
    [Research] Literature survey query
    Survey papers Hi all, First time posting here. I am doing my PhD in Language Conditioned Robotics. I am currently writing a literature review paper on the current state of the field and how it can be further improved. I am covering topics such as generative AI and LLMs in there. I would be more than grateful if you could send some literature review papers in the field of ML so I understand how to structure and write my paper and also what I should focus on mode. It doesn't necessarily have to be related to my PhD topic (but if they are it will help quite a bit). I would be more than happy if anyone can also share their experience. Thank you for your time! submitted by /u/bizzonkiller [link] [comments]  ( 9 min )
    [D] What are some of the best library frameworks to use for speech2text and text2speech AI chatbot
    Hey guys, what are some of the best library or libraries to use to make a voice conservational AI chatbot? I googled around and found Vocode. They look pretty good. However Vocode rely on several other (paid) closed sourced libraries such as Deepgram (for transcribing) and Azure AI Speech (for synthesising). Are there any other libraries/frameworks available out there which are completely or more open sourced? submitted by /u/redd-dev [link] [comments]  ( 9 min )
    [R] EFFICIENT STREAMING LANGUAGE MODELS WITH ATTENTION SINKS
    submitted by /u/Username912773 [link] [comments]  ( 9 min )
    6DOF Sim RL Capability [P]
    I have a 6DOF simulink model of a Autonomous underwater vehicle that has properties [u v w p q r x y z phi theta psi] and two inputs [theta1 theta2] that govern the angle of control surfaces. Ocean current and depth are taken into account. How feasible would it be to use RL to reach waypoints at various [x, y, z] positions? I don’t want to use a PID controller or anything, not even RL to tune a controller. The agent would choose the theta inputs directly. I have a feeling hyper paremeter tuning might play a larger role in this? I expect training times to increase exponentially as well? I have done this using a single randomly spawned waypoint with a simple Unicycle Kinematic model, in both simulink/matlab and python with a vectorized/parallel environment using SB3/PettingZoo/Gym. submitted by /u/VisionZUS [link] [comments]  ( 9 min )
    [R] BitNet: Scaling 1-bit Transformers for Large Language Models
    Arxiv link – BitNet: Scaling 1-bit Transformers for Large Language Models In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits. submitted by /u/PantsuWitch [link] [comments]  ( 9 min )
    [P] Achieving peak performance on GPU
    Hi r/MachineLearning! I recently went into the CUDA programming rabbit hole. In the process, I came across matrix multiplication and was amazed by how complicated the algorithm is in CUDA (especially if you want to get the best performance). I found the learning process quite gruelling (the CUDA docs were very average), so I wrote a tiny blog which hopefully helps anyone in the same position. You can read the blog on Medium (no paywall) or HackMD. It would probably be quite useful if you want to get a deeper intuition of how things like OpenAI Triton or FlashAttention work under the hood. Accompanying this is an implementation of a 3-hidden-layer MLP trained on MNIST in pure CUDA. Benchmarking this against PyTorch, it gets up 6x higher end-to-end training speed for small (h=128) networks, and asymptotically 20% faster for large (h=8192) ones! https://preview.redd.it/txx2txbvlzub1.png?width=2400&format=png&auto=webp&s=7bb136b9fb535bc58fd7ee809bbbca6f68dc8953 It's worth noting that I tried reasonably hard optimising the PyTorch implementation by using full fp16, torch.compile with fullgraph=True, mode="max-autotune", and pre-loading all data to GPU up-front (I also did this for the CUDA implementation). The main takeaways I got are: For small networks, PyTorch/Python still incurs a significant overhead, even if you try pretty hard to optimise it. For large networks, most of the speedup comes from using fp16 accumulation for matrix multiplication (instead of PyTorch's fp32). This obviously reduces stability, but at least in my case, I didn't observe any numerical issues. In cases where we can get away with fp16, we might be leaving a significant amount of performance on the table! Anecdotally, you have to try really hard in CUDA to even get close to the performance of PyTorch, but it is possible to beat it if you try hard (suffer) enough. You can check out the repo here: https://github.com/andylolu2/cuda-mnist. Would love to hear some feedback! submitted by /u/bjergerk1ng [link] [comments]  ( 10 min )
  • Open

    Institute Professor Daron Acemoglu Wins A.SK Social Science Award
    The award honors research on public policy with a focus on economic and governmental reforms.  ( 7 min )
  • Open

    Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions
    Purina US, a subsidiary of Nestlé, has a long history of enabling people to more easily adopt pets through Petfinder, a digital marketplace of over 11,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, Petfinder has helped millions of pets find their forever homes. Purina consistently […]  ( 9 min )
  • Open

    Understanding the user: How the Enterprise System Usability Scale aligns with user reality
    This position research paper was presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (opens in new tab) (CSCW 2023), a premier venue for research on the design and use of technologies that affect groups, organizations, and communities. In the business world, measuring success is as critical as selecting the right […] The post Understanding the user: How the Enterprise System Usability Scale aligns with user reality appeared first on Microsoft Research.  ( 10 min )
  • Open

    NVIDIA Expands Robotics Platform to Meet the Rise of Generative AI
    Powerful generative AI models and cloud-native APIs and microservices are coming to the edge. Generative AI is bringing the power of transformer models and large language models to virtually every industry. That reach now includes areas that touch edge, robotics and logistics systems: defect detection, real-time asset tracking, autonomous planning and navigation, human-robot interactions and Read article >  ( 8 min )
    Making Machines Mindful: NYU Professor Talks Responsible AI
    Artificial intelligence is now a household term. Responsible AI is hot on its heels. Julia Stoyanovich, associate professor of computer science and engineering at NYU and director of the university’s Center for Responsible AI, wants to make the terms “AI” and “responsible AI” synonymous. In the latest episode of the NVIDIA AI Podcast, host Noah Read article >  ( 6 min )
    Into the Omniverse: Marmoset Brings Breakthroughs in Rendering, Extends OpenUSD Support to Enhance 3D Art Production
    Real-time rendering, animation and texture baking are essential workflows for 3D art production. Using the Marmoset Toolbag software, 3D artists can enhance their creative workflows and build complex 3D models without disruptions to productivity.  ( 7 min )
    Foxconn and NVIDIA Amp Up Electric Vehicle Innovation
    NVIDIA founder and CEO Jensen Huang joined Hon Hai (Foxconn) Chairman and CEO Young Liu to unveil the latest in their ongoing partnership to develop the next wave of intelligent electric vehicle (EV) platforms for the global automotive market. This latest move, announced today at the fourth annual Hon Hai Tech Day in Taiwan, will Read article >  ( 6 min )
  • Open

    Portable sed -i across MacOS and Linux
    The -i flag to ask sed to edit a file in place works differently on Linux and MacOS. If you want to create a backup of your file before you edit it, say with the extension .bak, then on Linux you would run sed -i.bak myfile but for the version of sed that ships with […] Portable sed -i across MacOS and Linux first appeared on John D. Cook.  ( 6 min )
  • Open

    Best Books to Learn Neural Networks in 2023 for Beginners (Updated) -
    submitted by /u/Lakshmireddys [link] [comments]

  • Open

    Roughly how much time will a task running on a RTX 3060 take VS a ~i7 CPU? [Discussion]
    Anyone have examples of tasks run between the two? Doesn't need to be exact. submitted by /u/Apita2000 [link] [comments]  ( 8 min )
    [D] Feedback on my MVP project - Pre-Recorded Standardized Video Interviews Job Site for Data Professionals
    Hey! ​ Startup: - Apply Script dot com "Connect business and data professionals via pre-recorded standardized video interviews." ​ More details: ​ Problems with Traditional Hiring ​ - Outdated: The current method of conducting interviews has become overly complex and outdated. - Time-Wasting: The process involves too many appointments, meetings, and stages, leading to communication errors. - Expensive: The man-hours invested by HR and engineering teams are costly. - Constraining: Interviews are fixed to specific times and locations. - Cumbersome: The experience is challenging for both businesses and professionals. ​ Our Solution ​ + Talent Identification: We find top talent that matches your job post. + Standardized Interviews: Professionals standardized pre-record their …  ( 9 min )
    [D] Help identifying research papers for online / cyclic / sequential learning?
    So my situation is that I have a pretrained model and we get a new update of data every month (note: this monthly data is very small compared to the original dataset, the original dataset was about 5 years worth, or ~60x the size of any given monthly update), how can I update my pretrained model on the much smaller set of new data, learning from the data without overfitting to that data? Or frankly, what would be better if it is possible, would be to extend my pretrained model such that it learns from the new data and then can be more tightly fit to that month's data. So something like meta-learning or local fine-tuning, but I want to continue to update and improve my pretrained model so that I have a base model that can do well on each month's new data. Does anyone know anything like this, or have advance for terms to look into, beyond just transfer learning or regularization? submitted by /u/Amun-Aion [link] [comments]  ( 9 min )
    How to properly implement Cover's Theorem in an SVM? [P]
    Maybe this belongs elsewhere since it's probably a dumb basic question, but basically I'm taking an undergrad course in AI and we've been given a classification problem. We were told as a "hint" to recall Cover's Theorem when separation fails, but the issue is she also wants us to draw a rough sketch of the data with the separator. Mine failed in a basic scatterplot so I upped the dimension by 1 but it also wasn't separable in R3 (which is annoying to draw anyway but could have been done), if I keep going then it might work at some point but idk how I'm meant to draw the data if it's separated in R4 or beyond. If it works in R4 do I just sketch the data in R3 and just draw a 3 dimensional point where w = 0? But even then if it goes beyond R4 it becomes way more annoying. So I'm assuming my implementation is just wrong, maybe the formula I used was wrong. Can someone show what a proper implementation looks like and how we're meant to up dimensions? Don't wanna post what I tried bc it has starter code and stuff baked into it which might allow my professor to find this post 😂 submitted by /u/Traditional_Land3933 [link] [comments]  ( 9 min )
    [D] Cross Entropy Classification vs Metric Learning + k-NN for image classification?
    Hi guys. We've all seen how hot RAG and vector DBs have been lately. How good are retrieval-based approaches for image classification? More concretely: Suppose we have a network trained with metric learning and a massive, diverse set of labelled examples to retrieve from. We've just been tasked to do classification with a fixed number of classes, and we've narrowed it down to two options: Embed our dataset using our metric learning network, throw the embeddings into a vector DB, and do k-NN Train a classifier via cross-entropy loss Which approach would we expect to provide better performance? What are the trade-offs? Any insight is appreciated! submitted by /u/supersmartypants [link] [comments]  ( 9 min )
    [D] Graph Neural Networks - Links Prediction Task on Directed, Heterogenous Multigraphs
    Hi guys, I have the following use case at hand for my thesis, and I'd like to ask for some help to formulate my problem: A directed multigraph (1 node type, multiple edge types) Each node and edge have their own attributes A set of graphs that are fully labeled. The dataset is self-created according to some technical rules. Training is supposed to be done on this dataset. My task is to perform link prediction in the inductive setting. This means that given an unseen incomplete graph at the inference time, the model should be able to predict all the missing links. I have read many papers and tried to formulate my problem in many directions. Since I am also new to GNNs, I would prioritize papers with an existing codebase and sound theoretical justifications for the techniques (which …  ( 10 min )
    Trouble improving accuracy in face recognition dataset [P]
    Hey everyone Im trying my hands with the The Labeled Faces in the Wild face recognition dataset, for a face recognition task. I have made a siamesemodel, and my loss curve is looking great but my accuracy stays at 0.500, for everything i have tried. Is there anybody in here that have tried their hands with this task before that can give me some tips to improve my accuracy. I am implementing it in python with PyTorch btw Thanks in advance! submitted by /u/Due_Concentrate1279 [link] [comments]  ( 9 min )
    How valuable is a PhD in science (with applied ML) compared to a PhD in only Machine learning [D]
    Is it more advantageous to pursue a PhD in machine learning with a focus on scientific applications for example (Machine learning for drug design) if the end goal is to work in the machine learning industry? Or is a general PhD in machine learning more valuable for this career path? Thank you submitted by /u/Neat-Print2792 [link] [comments]  ( 9 min )
    [R] 85% of the variance in language model performance is explained by a single factor (g, a unified measure of LLM ability)
    TL;DR and paper link are at the bottom of the post. I'm an undergrad who just wrote my first paper completely solo. Crazy experience with so many highs and lows, but I learned a lot from it. I think the results are important and I want people to see them, so I'll try to walk through the paper here as best as I can. I also have a small request for Arxiv enjoyers at the end. Given the nature of Reddit posts, I'll focus a bit less on the methods and more on the results. I won't cite stuff here either, but obviously you can find citations in the paper. First I'll give a small bit of historical context to what I'm doing, then walk through what I did and what came of it. Enjoy the read. The general intelligence factor in humans In the early 1900s, Charles Spearman observed that children's …  ( 14 min )
    [D] How to design API of Machine learning library
    In the past nine years of my deep learning journey, I have come across a vast number of frameworks. Lua Torch was a fantastic framework that initially died due to a lack of Python's ecosystem, but then rose again as PyTorch. Theano was also a great framework, but its major drawback was difficult debugging. I remember spending two weeks writing a Neural Turing Machine for solving bAbI tasks on theano. (Nowadays, it would take a couple hours on Pytorch). Tensorflow - I still don't understand what that was, a terrible framework. There was also Caffe, which was popular in computer vision. Julia is another language that attempted to introduce automatic differentiation as a built-in feature. And JAX, which I was originally biased against since it's a Google product. But some close friends persuaded me to try it, and I actually liked it. However, I thought that it would be difficult for JAX to gain widespread adoption in the community, as PyTorch already had a strong network effect and was gaining traction quickly. I didn't see how anyone could catch up with PyTorch. Another issue with JAX is that it requires additional cognitive load for developers. Take a look: https://higgsfield.substack.com/p/how-to-design-api-of-machine-learning submitted by /u/Good-Willingness-985 [link] [comments]  ( 9 min )
    [P] 2D Gaussian Splatting a great starting point for people who want to delve deeper
    Github : https://github.com/OutofAi/2D-Gaussian-Splatting https://i.redd.it/cwgsjtko1sub1.gif submitted by /u/TerryCrewsHasacrew [link] [comments]  ( 8 min )
    [D] How to Build Data Products? Deploy: Part 3/4 - Doubling down on the power of Unified Experiences for building state of the art models.
    Data products plays an important role in building state of the art machine learning models. Though their building process seems a bit confusing within industry as of now, this article series tries to simplify it by breaking it and explaining it into 4 steps. Take a look: https://moderndata101.substack.com/p/how-to-build-data-products-deploy What processes are being followed at your org for building scalable data products? submitted by /u/growth_man [link] [comments]  ( 9 min )
    Shared Public Contextual Database for RAG [D]
    Hey Guys, It seems RAG is really taking off as an increasingly popular use case for LLMs to leverage contextual data. However, everybody is building their own contextual data sets and embedding them in their own silo'd vector dbs. Do you guys think there's any utility in having a shared public vector db that anyone can tap into their API, without having to self-host, worry about the embedding pipelines and filling the vector db with enough data in the first place for their use cases? Would this save devs alot of time in quickly testing testing product ideas? (albeit it does seem that propriety data is what everyone's raving about today) - For context, I'm building a social media product we're users can upload a few pieces (approx 10) of content (social media posts, websites, videos to start with), which becomes the verified human-curated list/Niche. We then classify and embed this into a vector db. From this, we have set up a data pipeline to scrape the web and find new content that is most similar which we suggest to users to add to the Niche (upvote, downvote style). When a piece of content is upvoted on its added to the verified list updating the Niche's classification string. Essentially we're aiming to construct an ever-growing, user-curated, contextually classified vector database from a relatively small set of sample data. submitted by /u/niksteel123 [link] [comments]  ( 9 min )
    [D] Work regarding using LLMs to generate data for downstream tasks.
    Hi. I'm curious if there have been any studies done regarding the effects of using data generated by LLMs for other downstream tasks. The closest that I could find are the two papers: Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias (Yu et al., 2023) Generating Training Data with Language Models: Towards Zero-Shot Language Understanding (Meng et al., 2022) The former focuses on studying the differences between the type of prompts that are used to generate the data and the latter doesn't use LLMs. Doesn't have to be papers, blog posts or any sort of information regarding the scenario I described is fine. Thanks. submitted by /u/Seankala [link] [comments]  ( 9 min )
    [D] Embedding models in production(CPU w/ high throughput)
    Hello, I am working on an app that requires creating lots of text embeddings(100M tokens). Looking at OpenAI Ada pricing(and considering that my app doesn't yet make any money) I'm looking into self-hosting a model to run on CPU. I know that constrains me towards smaller models-- so far locally I've been testing with sentence-transformers/all-MiniLM-L6-v2 and the query results seem okay-ish enough for my MVP. (Although, I should not that I haven't compared how embeddings with other models would perform.) Does anyone have experiences doing something similar? In particular, I'd love to hear about any tips you have for maximizing no. of embeddings / second. (new to ML/MLOps, so apologies if this is a silly question :) submitted by /u/rsamrat [link] [comments]  ( 9 min )
    [N] Introducing Stable Fast: An ultra lightweight inference optimization library for HuggingFace Diffusers on NVIDIA GPUs
    What is this? stable-fast is an ultra lightweight inference optimization library for HuggingFace Diffusers on NVIDIA GPUs. stable-fast provides super fast inference optimization by utilizing some key techniques and features: CUDNN Convolution Fusion: stable-fast implements a series of fully-functional and fully-compatible CUDNN convolution fusion operators for all kinds of combinations of Conv + Bias + Add + Act computation patterns. Low Precision & Fused GEMM: stable-fast implements a series of fused GEMM operators that compute with fp16 precision, which is fast than PyTorch's defaults (read & write with fp16 while compute with fp32). NHWC & Fused GroupNorm: stable-fast implements a highly optimized fused NHWC GroupNorm + GELU operator with OpenAI's triton, which eliminates the need…  ( 9 min )
    [R] Does the Flan T5 decoder take the question as input ?
    Hello, I was looking at the Flan T5 paper and code. It was clear that the question (instruction) and the context are given to the encoder as input. But I find no details on what does the decoder take as input apart from the fact that it starts with the pad token. Anyone can give me more details please ? Thanks ! submitted by /u/Meddhouib10 [link] [comments]  ( 9 min )
    [D] TensorFlow.js and state of the ecosystem for JavaScript
    I am curious about the state of the ecosystem for JavaScript, where TF looks like a reasonably solid option. Options I have found so far are: TensorFlow.js (looks like the most complete solution, but the general sentiment about TF in Python is pretty bad!) MediaPipe (to quickly implement specific use cases it seems, maybe using tf.js in the background?) ml5.js (a layer on top of tf.js to make it more approachable if i understand correctly) transformers.js (haven't quite grasped this one) shumai (bun only, so server side only) I am curious to read informed opinions about these and more! Have you used them and how? ​ submitted by /u/gtnbssn [link] [comments]  ( 9 min )
    [D] Which raw OPS/s benchmarks best reflect ML/DL workloads?
    Hi all. I'm preparing an open website about products used for AI/ML/DL computation (no in-house testing for now, just the database and GUI). However comparing raw speed of products of different vendors is more challenging than I anticipated, because there are many possible raw performance indicators, only few of which are provided by vendors. For example a raw performance indicator can be "FP32 vector with opportunistic optimization", while another can be "BF16 matrix/tensor without opportunistic optimization". A full picture of raw performance would be fully represented only by a table with multiple dimensions: Number format (FP64, FP32, TF32, FP16, BF16, FP8, INT8, INT4... are the others?) Vector vs. matrix/tensor operation (boolean) Opportunistic optimizations like Nvidia Sparsity…  ( 9 min )
    [D] Interesting loss graphs
    Wondering if anyone has some interesting loss graphs that they could share. Maybe loss suddenly dropped after 100 epochs, or a local minima was found and then it jumped into a lower one. Wondering if anyone forgot to turn off training and cam back to an improved result than what they thought had already been converged to. submitted by /u/HStuart18 [link] [comments]
  • Open

    Thoughts on new ChatGPT features
    I've had access to Dall-3, Vision and voice chat features, and I've been blown away by how impressive each of the new features are. Dall-E 3 seems roughly comparable to Midjourney in overall image quality, but does a much better job at understanding the prompt. The vision model continues to surprise by how well it is able to understand images at a seemingly human level of comprehension. And the voice chat is such an intuitive and captivating way of interacting with ChatGPT, it felt like I was interacting with one of the AI assistants from the movie "Her". However, it's unfortunate that these amazing new features cannot be used together at the same time. Up until gaining access to these features, I had been using the advanced data analysis model as my default, which is great for helping with programming tasks. I can only imagine how revolutionary ChatGPT will be when a cohesive multi-modal model is released sometime in the near future which has all these capabilities available from the start. What things would you want to try if such a cohesive model was released? I can already imagine some use cases where you could set up iterative improvement for things like interface design, which some people have already got to work with just the base vision model by itself. submitted by /u/ImRealNow [link] [comments]
    U.S. Tightens China's Access to Advanced Chips for Artificial Intelligence
    The Biden administration has announced additional limits on sales of advanced semiconductors by American firms to China, in an effort to restrict China's progress on supercomputing and artificial intelligence. The new rules will likely halt most shipments of advanced semiconductors from the United States to Chinese data centers, which use them to produce models capable of artificial intelligence. Chip makers seeking to sell China advanced chips or the machinery used to make them will be required to notify the government of their plans or obtain a special license. To prevent the risk of advanced U.S. chips reaching China through third countries, chip makers will also need licenses to ship to other countries subject to U.S. arms embargoes. The Biden administration argues that China's access to advanced technology is dangerous as it could aid the country's military in tasks like guiding hypersonic missiles or cracking top-secret U.S. codes. The restrictions may affect Chinese companies developing AI chatbots and could weaken China's economy in the long run, as AI is transforming industries from retail to healthcare. The limits are also expected to impact sales to China of U.S. chip makers such as Nvidia, AMD, and Intel, who earn a significant portion of their revenue from Chinese buyers. The rules will exempt chips used in commercial applications like smartphones, laptops, electric vehicles, and gaming systems. The Semiconductor Industry Association, which represents major chip makers, is evaluating the impact of the updated rules. The Biden administration has been trying to counter China's growing mastery of cutting-edge technologies by investing in new chip factories in the U.S. while setting restrictions on exports of technology to China. Source : https://www.nytimes.com/2023/10/17/business/economy/ai-chips-china-restrictions.html submitted by /u/NuseAI [link] [comments]
    Google: Data-scraping lawsuit would take 'sledgehammer' to generative AI
    Google has asked a California federal court to dismiss a proposed class action lawsuit that claims the company's scraping of data to train generative artificial-intelligence systems violates millions of people's privacy and property rights. Google argues that the use of public data is necessary to train systems like its chatbot Bard and that the lawsuit would 'take a sledgehammer not just to Google's services but to the very idea of generative AI.' The lawsuit is one of several recent complaints over tech companies' alleged misuse of content without permission for AI training. Google general counsel Halimah DeLaine Prado said in a statement that the lawsuit was 'baseless' and that U.S. law 'supports using public information to create new beneficial uses.' Google also said its alleged use of J.L.'s book was protected by the fair use doctrine of copyright law. Source : https://www.reuters.com/legal/litigation/google-says-data-scraping-lawsuit-would-take-sledgehammer-generative-ai-2023-10-17/ submitted by /u/NuseAI [link] [comments]
    [AI Dad Joke] Why did the AI stop being nice?
    It regressed to mean... PS: I read the sidebar which didn't exclude humor, and the flair seems to suggest that it would be okay, but my apologies if not. submitted by /u/Tyler_Zoro [link] [comments]
    👨🏻‍🏫 Generative AI Security Standards, LLM‘s 200K Context Window, Alibaba's Open-Source Obsession, and Baidu World 2023
    submitted by /u/trcytony [link] [comments]
    Can GPT models be financial analysts? ChatGPT, GPT-4 fail CFA exams in new study by JP Morgan, Queens University, and Virginia Tech
    Researchers evaluated ChatGPT and GPT-4 on mock CFA exam questions to see if they could pass the real tests. The CFA exams rigorously test practical finance knowledge and are known for being quite difficult. They tested the models in zero-shot, few-shot, and chain-of-thought prompting settings on mock Level I and Level II exams. The key findings: GPT-4 consistently beat ChatGPT, but both models struggled way more on the more advanced Level II questions. Few-shot prompting helped ChatGPT slightly Chain-of-thought prompting exposed knowledge gaps rather than helping much. Based on estimated passing scores, only GPT-4 with few-shot prompting could potentially pass the exams. The models definitely aren't ready to become charterholders yet. Their difficulties with tricky questions and core finance concepts highlight the need for more specialized training and knowledge. But GPT-4 did better overall, and few-shot prompting shows their ability to improve. So with targeted practice on finance formulas and reasoning, we could maybe see step-wise improvements. TLDR: Tested on mock CFA exams, ChatGPT and GPT-4 struggle with the complex finance concepts and fail. With few-shot prompting, GPT-4 performance reaches the boundary between passing and failing but doesn't clearly pass. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Let's find out what GPT4 vision can do
    GPT4 vision isn't just a gimmick. We've been given a new superpower, and so we must "deal with it". This is probably as big a moment as when chatGPT first arrived, maybe more. Machine Vision for the masses (and more). I tried doing some very loose sketches, and it really struggled to identify them until they were coloured in. Humans could easily what they were. But, in order to see what uses it has, we need to know what capabilities it does and does not have. Pick a question and see what you can learn! can it use TINY images (I assume they are much faster) can it tell you what has changed in two images? can it measure distances ? (with perspective?) can it make 3d models from instructions? can it "learn" to recognise people/ similar objects (in the same context window) what limits are there to exhaustive listing exhaustive description is it better at details or overviews can it read maps / graphs / text how smart is it on DIY / xrays / mechanics can it follow wires?? (Can it find lego) is there a formal reference system you can use (X/Y) can it give co-ordinates in large grids or grid-like (how un-grid like) ie film strip, or window-panes can it navigate a 2d maze turn-by turn? 3d maze? can that be insanely complex? can it make ebay descriptions (condition) can it estimate food weight can it estimate strength / angles / volume can it create programs from screenshots. Can it use programs? games? control RC car / robot? what kind of language / instructions are best when talking about images. what other questions do we need submitted by /u/inteblio [link] [comments]
    AI pioneers LeCun, Bengio clash in intense online AI safety, governance debate
    Yann LeCun and Yoshua Bengio, two influential figures in AI and deep learning, engaged in a heated debate over the potential risks and safety concerns surrounding AI. LeCun emphasized the need to design AI systems for safety rather than imagining catastrophic scenarios. Bengio argued for the importance of prudence, stating that we still do not understand how to design safe, powerful AI systems, and highlighted the need for major investment in AI safety and governance. The debate highlighted the disagreement among esteemed researchers about AI's potential risks, the effectiveness of current safety measures, and the best path forward. The implications of AI, including job displacement, privacy violations, and existential risks, have become a topic of widespread concern. Source : https://venturebeat.com/ai/ai-pioneers-yann-lecun-and-yoshua-bengio-clash-in-an-intense-online-debate-over-ai-safety-and-governance/ submitted by /u/NuseAI [link] [comments]
  • Open

    Learn how Amazon Pharmacy created their LLM-based chat-bot using Amazon SageMaker
    Amazon Pharmacy is a full-service pharmacy on Amazon.com that offers transparent pricing, clinical and customer support, and free delivery right to your door. Customer care agents play a crucial role in quickly and accurately retrieving information related to pharmacy information, including prescription clarifications and transfer status, order and dispensing details, and patient profile information, in […]  ( 8 min )
    Keeping an eye on your cattle using AI technology
    At Amazon Web Services (AWS), not only are we passionate about providing customers with a variety of comprehensive technical solutions, but we’re also keen on deeply understanding our customers’ business processes. We adopt a third-party perspective and objective judgment to help customers sort out their value propositions, collect pain points, propose appropriate solutions, and create […]  ( 16 min )
    Personalize your search results with Amazon Personalize and Amazon OpenSearch Service integration
    Amazon Personalize has launched a new integration with Amazon OpenSearch Service that enables you to personalize search results for each user and assists in predicting their search needs. The Amazon Personalize Search Ranking plugin within OpenSearch Service allows you to improve the end-user engagement and conversion from your website and app search by taking advantage […]  ( 7 min )
  • Open

    DSC Weekly 17 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 17 October 2023 appeared first on Data Science Central.  ( 20 min )
    Uncharted digital landscapes and the quest for timeless identity
    In a recent podcast episode, Lex Freedman and Mark Zuckerberg convened in the Metaverse, where the digital realm intertwines with reality. Their astonishingly realistic interaction, while highlighting technological advancements, also prompted deeper contemplations. As the line between digital recreations and reality becomes increasingly blurred, it beckons questions about the definitions of identity and consciousness and… Read More »Uncharted digital landscapes and the quest for timeless identity The post Uncharted digital landscapes and the quest for timeless identity appeared first on Data Science Central.  ( 22 min )
    Internet Of Things (IOT):  Application In Hazardous Locations
    Introduction to Internet of Things (IOT): Internet of Things (IoT) represents the fourth-generation technology that facilitates the connection and transformation of products into smart, intelligent and communicative entities. IoT has already established its footprint in various business verticals such as medical, heath care, automobile, and industrial applications. IoT empowers the collection, analysis, and transmission of… Read More »Internet Of Things (IOT):  Application In Hazardous Locations The post Internet Of Things (IOT):  Application In Hazardous Locations appeared first on Data Science Central.  ( 23 min )
    The digital evolution in aviation: how big data and analytics are transforming the industry
    Long before passengers sit back, relax, and enjoy their flight, data has played a critical role in getting them to their seats. It has been a cornerstone of the aviation industry since the early days of air travel. Indeed, from the early 20th century, data was collected through manual processes such as pilots logging information… Read More »The digital evolution in aviation: how big data and analytics are transforming the industry The post The digital evolution in aviation: how big data and analytics are transforming the industry appeared first on Data Science Central.  ( 20 min )
  • Open

    "STARC: A General Framework For Quantifying Differences Between Reward Functions", Skalse et al 2023
    submitted by /u/gwern [link] [comments]
    "Goodhart's Law in Reinforcement Learning", Karwoski et al 2023
    submitted by /u/gwern [link] [comments]
    Dynamic state and action space
    Hello, I’m working on a scenario that involves many systems and each system involves many subsystems. At each decision time and according to the system that requests the decision, the RL agent must select a subsystem. Nevertheless, each system has a different number of subsystems which makes the action space and the state space dynamic since the each neurone in the output represents a subsystem. Can I use the maximal number of subsystems (not the total number) as the number of the output and masking some neurones according to the current system ? submitted by /u/GuavaAgreeable208 [link] [comments]
    Offline rl- interpreting policy
    I am new to RL and have a naive question. How interpretable would the policy be from building a rl algorithm in an offline setting? Could I make inferences about what the optimal sequences would be? submitted by /u/kwsunshine123 [link] [comments]
  • Open

    Goal Representations for Instruction Following
    Goal Representations for Instruction Following Figure title. Figure caption. This image is centered and set to 50% page width. --> A longstanding goal of the field of robot learning has been to create generalist agents that can perform tasks for humans. Natural language has the potential to be an easy-to-use interface for humans to specify arbitrary tasks, but it is difficult to train robots to follow language instructions. Approaches like language-conditioned behavioral cloning (LCBC) train policies to directly imitate expert actions conditioned on language, but require humans to annotate all training trajectories and generalize poorly across scenes and behaviors. Meanwhile, recent goal-conditioned approaches perform much better at general manipulation tasks, but do not enable easy t…  ( 7 min )
  • Open

    Striking Performance: Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows
    GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.  ( 7 min )
    NVIDIA RTX Video Super Resolution Update Enhances Video Quality, Detail Preservation and Expands to GeForce RTX 20 Series GPUs
    NVIDIA today announced an update to RTX Video Super Resolution (VSR) that delivers greater overall graphical fidelity with preserved details, upscaling for native videos and support for GeForce RTX 20 Series GPUs.  ( 7 min )
  • Open

    New technique helps robots pack objects into a tight space
    Researchers coaxed a family of generative AI models to work together to solve multistep robot manipulation problems.  ( 11 min )
  • Open

    Model metamers reveal divergent invariances between biological and artificial neural networks
    submitted by /u/Chipdoc [link] [comments]  ( 8 min )

  • Open

    Article: Key Concepts and Open Questions in a Golden Age for Natural Language Understanding
    submitted by /u/Stanford_Online [link] [comments]
    DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands
    🌟 Excited to share our recent research, DexCatch! Pick-and-place is slow and boring, while throw-catching is a behaviour towards more human-like manipulation. We propose a new model-free framework that can catch diverse objects of daily life with dexterous hands in the air. This ability to catch anything from a cup to a banana, and a pen, can help the hand quickly manipulate objects without transporting objects to their destination -- and even generalize to unseen objects. Video demonstrations of learned behaviors and the code can be found at https://dexcatch.github.io/. ​ https://reddit.com/link/17973ri/video/i4xdo39d4lub1/player submitted by /u/Shengjie_Wang [link] [comments]
    Help with Model Based Policy Optimization
    I am reading this paper and came across the following paragraph - ​ "Model usage. Many recent model-based algorithms have focused on the setting in which model rollouts begin from the initial state distribution (Kurutach et al., 2018; Clavera et al., 2018). While this may be a more faithful interpretation of Algorithm 1, as it is optimizing a policy purely under the state distribution of the model, this approach entangles the model rollout length with the task horizon. Because compounding model errors make extended rollouts difficult, these works evaluate on truncated versions of benchmarks. The branching strategy described in Section 4.2, in which model rollouts begin from the state distribution of a different policy under the true environment dynamics, effectively relieves this limitation. In practice, branching replaces few long rollouts from the initial state distribution with many short rollouts starting from replay buffer states." ​ What does state distribtion mean over here? Also in line 8 of the image, I don't understand what's the relation between model rollout and policy \pi_t. Is it saying, use the model free algorithm to take future steps from that state? What does the model have to do with that? ​ https://preview.redd.it/twlej5my3kub1.png?width=1182&format=png&auto=webp&s=4a515c8d237c963052bc1b60a9e7dda53a33f001 submitted by /u/Academic-Rent7800 [link] [comments]
    math prerequisites for reinforcement learning research?
    hi all! i’m an undergraduate that is really interested in pursuing a PhD. i think reinforcement learning is especially interesting, causal reinforcement learning in particular. for my current research job, which unfortunately doesn’t really involve ML, i read a little about causal inference and it really intrigued me. what mathematics courses should i take to get into RL research at a theoretical/algorithmic level? i am currently taking proof-based linear algebra, and have taken all the computational calculus offered. i imagine prob. theory/math stats is pretty important, too; what else? submitted by /u/treeman0469 [link] [comments]
  • Open

    [D] Exploring Methods to Improve Text Chunking in RAG Models (and other things...)
    Hello everyone, I'm currently working on Retrieval Augmented Generation (RAG) models and have developed a custom chunking function, as I found the methods in LangChain not entirely satisfactory. I'm keen on exploring other methods, algorithms (related to NLP or otherwise), and models to enhance text chunking in RAG. There are many RAG implementations out there, but I've noticed a lack of focus on improving chunking performance specifically. Are there any other promising approaches beyond my current pipeline, which consists of a bi-encoder (retriever), cross-encoder (reranker), and a Large Language Model (LLM) for interactions? For queries, I'm using both traditional and HyDE (Hypothetical Document Embedding) approaches in the retrieval phase, and sending the top 'n' results of both similarity search to the reranker. I've also tried using an LLM to convert the query into a series of 10-20 small phrases or keywords, which are then used as the query for the retriever model. However, the results vary depending on the LLM used. To generate good keywords (with a not extractive approach) , I had to use a "CoT" prompt, instructing the model to write self-instruct, problem analysis and reasonings before generating the required keywords. But this approach use lots of tokens, and requires careful scraping to ensure the model has used the right delimiter to separate reasoning and the actual answer. I'm also planning to modify the text used to generate embeddings, while returning the original text after the recall phase. But this is still a work in progress and scaling it is proving to be a challenge. If anyone has any tips or experience with this, I'd appreciate your input. I'd be grateful for any resources, repositories, libraries, or existing implementations of novel chunking methods that you could share. Or we could just discuss ideas, thoughts, or approaches to improve text chunking for RAG here. Thanks in advance for your time! submitted by /u/BXresearch [link] [comments]  ( 9 min )
    [D] Rate my GPU server for Deep Learning
    I started learning deep learning last year and decided to step up my game with regard to model training and tools. I recently built a GPU server. It’s still within its return period, so please help decide if it’s worth keeping: Processor: 2x Xeon E5-2690 v4 2.6GHz 14-Core Memory: 128GB GPU: 8x NVIDIA Tesla P100 16GB HBM2 Accelerator Card Total cost: ~$3200 submitted by /u/Stonks-Stocks [link] [comments]  ( 9 min )
    [N] "How to Apply to Grad School" webinars by CMU RI!
    We are hosting a few "How to Apply to Grad School" webinars this week. This is a chance to hear from faculty and students in the Robotics Institute at CMU on what life in grad school is actually like, as well as get some tips on crafting a strong application! https://cmu-ri-resources.github.io/ submitted by /u/bart-ai [link] [comments]  ( 9 min )
    [R] Microsoft presents Table-GPT: Table-tuned GPT for Diverse Table Tasks
    Tables pack tons of relational data but are tough for AI to grasp. They have complex 2D structure with information scattered across rows and columns. Models like GPT-3 fail basic tasks like finding where a missing value should go. LLMs struggle at this because they're pre-trained mostly on natural text, which is linear. Researchers at Microsoft wanted to mitigate this with "table-tuned" models, trained on table-related tasks. Their process: Automatically generate lots of diverse table-task training cases from a corpus of real-world tables. Ex: "impute missing value" or "identify error in table". Further augment data via paraphrasing, shuffling table rows/columns, chaining model responses, etc. This table-tuning produced "Table-GPT" models with substantially stronger table skills. In experiments, Table-GPT crushed vanilla GPT-3: 25%+ better on unseen table tasks like missing value ID and column type ID Beat GPT-3 on 98% of test cases across 9 different table tasks Stayed superior after downstream tuning too There's tons more work to do but seems pretty promising. Table-tuning boosted models' ability to comprehend tables and reason over tabular data vs just pre-training on text. TLDR: Training AI models more on synthesized table tasks ("table-tuning") significantly improves their table skills. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Text-to-pose?
    When are we getting a text-to-pose ai? I'd love to be able to generate poses for 3d models that match a given text description, because sometimes what my mind comes up with doesn't feel adequate. It's frustrating that I'm not seeing any developments in this area of ai, and I lack the skills to commence the developments myself. submitted by /u/BM09 [link] [comments]  ( 9 min )
    [R] Google Pali-3 Vision Language Models: Contrastive Training Outperforms Classification
    submitted by /u/currentscurrents [link] [comments]  ( 8 min )
    [D] Sources of esoteric data? Specifically looking for 6dof motion data from a medium to large oceangoing vessel underway in various sea states.
    I am ok with paying for the data. I just can't find any sources for it. I found some data on github that appears to come from container ships at port, but nothing for a ship underway. submitted by /u/jschall2 [link] [comments]  ( 9 min )
    [D] Adding a modality to a pre-trained model
    Hi, I have a dataset with video and other modalities (e.g. audio), and I want to run a captioning task. I found UniVL, which is a pre-trained model that supports video and text (transcripts) and can caption them. It extracts features and runs transformer encoders on both these modalities to get an embedding, then concatenates them and feeds it into a cross-encoder and decoder to get captions. I'm wondering if I can make use of this model, but add in other modalities, by writing my own embedding model and feeding the embeddings into the cross encoder. Would this work? Is there any similar previous work regarding adding new modalities to a pre-trained network? submitted by /u/joeswansonx69x [link] [comments]  ( 9 min )
    [D] What is the current SOTA of Neural Architecture Search (NAS)?
    I've seen classic papers before 2021 that have been quite influential - RL and evolution based strategies. I have also seen: differentiable approaches: https://arxiv.org/abs/1806.09055 zero-learning approaches: https://arxiv.org/abs/2006.04647 But these are all papers pre-2021. From people who are familiar with this field, what is the current SOTA of neural architecture search (NAS) post 2022? i.e. papers that can serve as the most relevant baselines? Thank you! :) ​ ​ ​ submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    [D] Is active learning a dying field in industry, given the development in few shot/zero shot learning?
    Is active learning a dying topic when zero shot learning came out? Active learning is to used few labeled samples plus a initially trained model to select the most useful unlabeled data for training. Zero/few shot learning is to train a model on some data then Mae it work directly with unseen label/data. In my understanding, zero/few short learning is more aligned with the current large model trend or foundation model trend. Active learning strategy seems to still rely on small dataset and was intending to gradually enrich training data by selecting new samples in. In industry and in big tech, which one is more used or deployed? Anyone can give me some comments? submitted by /u/Little-Bumblebee-452 [link] [comments]  ( 9 min )
    [R] Decoding LLM Uncertainties for Better Predictability
    Hi all, Building off our last research post, we wanted to figure out ways to quantify "ambiguity" and "uncertainty" in prompts/responses to LLMs. We ended up discovering two useful forms of uncertainty: "Structural" and "Conceptual" uncertainty. In a nutshell: Conceptual uncertainty is when the model isn't sure what to say, and Structural uncertainty is when the model isn't sure how to say it. You can play around with this yourself in the demo or read about it in more detail in the blog post submitted by /u/shayanjm [link] [comments]  ( 9 min )
    [D] For large datasets, is your data selection process limiting model performance?
    I often hear from folks with very large datasets saying: “my labelling costs keep increasing, but we don’t see model performance improvements” or “my storage and compute costs are rising (for a dataset of 1M+ images) but performance just stalled”. This post argues that large datasets have hidden costs, beyond time and money, poor data quality and the wrong selection process might be killing model performance. Any thoughts? Have you faced this challenge? submitted by /u/btcmx [link] [comments]  ( 9 min )
    [P] SemanticSearch for PDF mining
    Hello, everyone! I'm seeking tips to enhance my semantic search pipeline. Currently, I'm working on a semantic search tool. Given a set of text files, my goal is to retrieve the most relevant information related to the query. To achieve this, I begin by preprocessing the PDF files, splitting them into pages, and computing embeddings using a fine-tuned BERT model for Italian. Next, with a query and its embedding, I calculate the cosine similarity to all the pages in the document. Since there aren't many pages, a brute search remains quite fast. However, I'm encountering an issue where the similarity results don't consistently yield the most relevant information. I've experimented with various embedding layers, but there's been little to no improvement. I've also tested a commercially available solution to ensure the problem isn't with my PDF files. Interestingly, I achieved better results, leading me to believe that the issue may lie within my pipeline. My current hypothesis is that the page splitting process might be excluding relevant semantic connections, and I may need to improve my text preprocessing. What suggestions do you have to enhance my results? P.S. The information obtained from the similarity check is subsequently used as context with a chat language model, similar to tools like AsMyPdf. submitted by /u/AcquaFisc [link] [comments]  ( 9 min )
    [D] Good compression algo to compress model checkpoints?
    I have a couple of terabytes of checkpoints, and I desperately need to free up some space, without deleting those atm. Is there a compression algorithm that can handle such data successfully? I tried gzip with tar but the compressed size ended up being only ~100G less - that's when I realized that (gzip) compression algo is not good at handling seemingly random numerical data. Do you know of methods that've proven to work in this scenario? submitted by /u/OpeningVariable [link] [comments]  ( 9 min )
    [R] Think before you speak: Training Language Models With Pause Tokens
    https://arxiv.org/pdf/2310.02226.pdf Abstract Language models generate responses by producing a series of tokens in immediate succession: the (K+1)th token is an outcome of manipulating K hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, K+10 hidden vectors, before it outputs the (K+1)th token? We operationalize this idea by performing training and inference on language models with a (learnable) pause token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate pause-training on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of 18% EM score on the QA task of SQuAD, 8% on CommonSenseQA and 1% accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm. Here is a Medium post about my thoughts on the paper. submitted by /u/transformer_ML [link] [comments]  ( 9 min )
    [D] Can Direct Preference Optimization (DPO) be used to replace any type of RL for LLMs, or is it better suited for just scenarios like RLHF?
    DPO Paper I read a really fascinating paper where RL was used on LLMs to make them better at interacting in embodied environments. https://arxiv.org/abs/2310.08588 The technique was called Reinforcement Learning with Environmental Feedback (RLEF). In the paper PPO was used, but I'm wondering if DPO could be used to replace it? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    How to create dataset for training generative chatbot model? [D]
    i built my own custom generative ai chatbot model. only thing i need is high quality and diverse dataset to train my model. i cant use already existing datasets because i dont think they are diverse and quality enough.so i need to create it using gpt4. my dataset will have 3 columns ; system_prompt, input, output. but im not very experienced on creating datasets, and i couldnt find any resources about this. all input ,output and system prompt all should be created by gpt4. how can i do it? and what is most effective way to use api for this? submitted by /u/Many-Corner-6700 [link] [comments]  ( 9 min )
    [P] MergeLlama-7b - A fine tune of CodeLlama for resolving merge conflicts
    Merge conflicts are something that give developers hours of headaches and I figured I would try and give my take on a solution. I followed a paper from IEEE engineers in 2022 who trained CodeBert on merge conflicts as a classification task, and they published their dataset for public use. Input formatted as “>>>>>>” will output the attempted conflict resolution. I am still trying to find out how to do evaluations on this model as the loss applies to all sections not just the resolution, and the TRL Trainer with a data collator gives NaN as a loss. The model and dataset are on HuggingFace under codys12/MergeLlama and codys12/MergeLlama-7b. Any feedback is appreciated! submitted by /u/cstein123 [link] [comments]  ( 9 min )
    [P] OpenLLMetry, a way to get complete visibility into RAG pipelines with your existing tools
    Hey, I've built a set of extensions for OpenTelemetry that provides visibility into LLM applications like RAG pipelines - whether it be prompts, vector DBs and more. Here’s the repo: https://github.com/traceloop/openllmetry. Two key benefits with OpenTelemetry are - You can trace your entire system execution, not just the LLM (so you can see how requests to DBs, or other calls affect the overall result); You can connect to any monitoring platform—no need to adopt new tools. Install the SDK and plug it into Datadog, Sentry, or both. Or switch between them easily. There's already support for OpenAI, Anthropic, Cohere, Pinecone, Chroma, LangChain, and Haystack and we are working hard to support the entire ecosystem. Would love to hear your thoughts submitted by /u/nirga [link] [comments]  ( 9 min )
    Can AI Replace Developers? Princeton and University of Chicago's SWE-bench Tests AI on Real Coding Issues [N]
    Exploiting AI to make software programming easier? SWE-bench, a unique evaluation system, tests language models' ability to solve real GitHub-collated programming issues. Interestingly, even top-notch models manage only the simplest problems, underscoring tech development's urgency for providing practical software engineering solutions. For the latest advancements in AI, look here first. https://preview.redd.it/rq5vl22bckub1.png?width=1292&format=png&auto=webp&s=d79988bfe0ab37b0f97f55296d7a7341c9292c11 A New Approach to Evaluating AI Models Researchers use real-world software engineering problems from GitHub to assess language models' coding problem-solving skills. SWE-bench, introduced by Princeton and the University of Chicago, offers a more comprehensive and challenging benchmark…  ( 9 min )
    How to design a Chat-GPT or Bard-like large scale app with your own foundational model? [D]
    I am just puzzled how does one efficiently query a huge transformer model such that so many users can be served at the same time. Is it queried on per user basis? (modulo some caching) If yes, how expensive is this? If no, what the hell is going on? :D Are there any good resources on this? (how to build large scale apps with big models, from scratch). Somehow this doesn't really fit the standard data-intensive system design process, or maybe I am missing something. submitted by /u/jimmymvp [link] [comments]  ( 9 min )
  • Open

    Taken on my screen, but I can’t get over what it has become. I’m obsessed with AI.
    submitted by /u/Prestigious_Rough704 [link] [comments]
    I built an AI tool to help authors create webcomics
    I always did want to draw a comic but I was never very good at drawing even though I put a lot of effort into it when I was younger... :'( So when I stumbled on image generation AI, I thought maybe it could help me transform my doodles into something decent. It took me a while and a lot of effort to write a tool to help me with that : story and dialogues are my own, images are based on doodles enhanced by AI. I would love to have feedback about the story : https://stripik.com/story/4/chapter/4/ ​ https://preview.redd.it/dvcudd4j3mub1.png?width=800&format=png&auto=webp&s=717bef60eaaf9b9a35a1a66f266c374406a923fa submitted by /u/maxcmoi [link] [comments]
    I'm chronicling the process of trying to create a boardgame with Chat GPT and it's amazing just how great of an assistant it is!
    submitted by /u/SexyJimBelushi [link] [comments]
    If SEO tools were Nintendo 3DS games [Powered by AI]
    Did you play these (SEO) games? 👾 https://preview.redd.it/yxuzllzupkub1.jpg?width=661&format=pjpg&auto=webp&s=23ebc6e972ac85b152aa8b69f48e2b0c5bae2c76 https://preview.redd.it/x8zfokzupkub1.jpg?width=661&format=pjpg&auto=webp&s=be2163a7bfbeee64a63c1292a5b4c482c5be33ae https://preview.redd.it/eerpgnzupkub1.jpg?width=661&format=pjpg&auto=webp&s=d8eceafd3732653c743a6731ae5932c9e0da071c https://preview.redd.it/uxwgskzupkub1.jpg?width=661&format=pjpg&auto=webp&s=07c751eaa16f8fa484034c98a3c1fd0b2162f5a2 Source: https://twitter.com/carlos_darko/status/1713900305765605484 submitted by /u/DanielPeris [link] [comments]
    Can AI Replace Developers? Princeton and University of Chicago's SWE-bench Tests AI on Real Coding Issues
    Exploiting AI to make software programming easier? SWE-bench, a unique evaluation system, tests language models' ability to solve real GitHub-collated programming issues. Interestingly, even top-notch models manage only the simplest problems, underscoring tech development's urgency for providing practical software engineering solutions. For the latest advancements in AI, look here first. https://preview.redd.it/8laeg7cbckub1.png?width=1292&format=png&auto=webp&s=e549f0045a7253cd2d3f351d8297a301c4cbf6ac A New Approach to Evaluating AI Models Researchers use real-world software engineering problems from GitHub to assess language models' coding problem-solving skills. SWE-bench, introduced by Princeton and the University of Chicago, offers a more comprehensive and challenging benchmark…
    Deep fake language change
    What is the best free tool to make a video where the language changes? submitted by /u/Easy_Technology6768 [link] [comments]
    One-Minute Daily AI News 10/15/2023
    New York-based tech firms and investors see the advent of AI as the latest opportunity to try to unseat the Bay Area as tech’s global capital.[1] Microsoft announced a new “bug bounty” program, vowing to reward security researchers between $2,000 and $15,000 if they’re able to find “vulnerabilities” in its Bing AI products, including “jailbreak” prompts that make it produce responses that go against the guardrails that are supposed to bar it from being bigoted or otherwise problematic.[2] OpenAI is preparing to launch a suite of updates to make it more cost-effective and efficient for developers to create software applications with AI models.[3] TCS Seeks to Use Microsoft AI Partnership to Improve Margins.[4] Sources: [1] https://www.axios.com/2023/10/12/new-york-ai-world-capital [2] https://futurism.com/the-byte/microsoft-bing-ai-bug-bounty [3] https://www.techedt.com/openai-aims-to-attract-developers-with-cost-effective-updates-insiders-reveal [4] https://www.bloomberg.com/news/articles/2023-10-15/tcs-seeks-to-use-microsoft-ai-partnership-to-improve-margins#xj4y7vzkg submitted by /u/Excellent-Target-847 [link] [comments]
    Are there an image generators that can generate the same image you upload to it, but from a different hypothetical angle?
    I was wondering if any AI image generation was good at this (yet?). I have a real-life image I want to upload and get AI to generate what that would most likely look like from the vantage point of someone standing at a different angle. submitted by /u/YepperyYepstein [link] [comments]
    AI dubbing ( local )
    Hi there, anybody knows how AI dubbing translator works ? As im interested if something similiar to https://app.rask.ai/ exist localy ?? Is there anything from github? Im looking for czech language. I know you can scribe audio to text than translate text and let AI to talk this text. But is there a tool that do all of this in one click ? Thank you and have a nice day. submitted by /u/Low_Government_681 [link] [comments]
  • Open

    A method to interpret AI might not be so interpretable after all
    Some researchers see formal specifications as a way for autonomous systems to "explain themselves" to humans. But a new study finds that we aren't understanding.  ( 9 min )
  • Open

    How Veriff decreased deployment time by 80% using Amazon SageMaker multi-model endpoints
    Veriff is an identity verification platform partner for innovative growth-driven organizations, including pioneers in financial services, FinTech, crypto, gaming, mobility, and online marketplaces. In this post, we show you how Veriff standardized their model deployment workflow using Amazon SageMaker, reducing costs and development time.  ( 8 min )
  • Open

    DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
    How trustworthy are generative pre-trained transformer (GPT) models? To answer this question, University of Illinois Urbana-Champaign, together with Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research, released a comprehensive trustworthiness evaluation platform for large language models (LLMs), which is presented in the recent paper: DecodingTrust: A Comprehensive Assessment of Trustworthiness […] The post DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models appeared first on Microsoft Research.  ( 11 min )
  • Open

    Explainable Artificial Intelligence (XAI) for AI & ML Engineers
    Introduction Hello AI&ML Engineers, as you all know, Artificial Intelligence (AI) and Machine Learning Engineering are the fastest growing fields, and almost all industries are adopting them to enhance and expedite their business decisions and needs; for the same, they are working on various aspects and preparing the data for the AIML platform with the help of SMEs… Read More »Explainable Artificial Intelligence (XAI) for AI & ML Engineers The post Explainable Artificial Intelligence (XAI) for AI & ML Engineers appeared first on Data Science Central.  ( 23 min )
  • Open

    Nearest, easiest, and most accessible
    From Love What Lasts, Joshua Gibbs: … there are too many things in the world to care equally about them all. The sheer volume of things … demands that we have hierarchical standards by which to judge their value, or else we are condemned to give our lives over entirely to what is nearest, easiest, […] Nearest, easiest, and most accessible first appeared on John D. Cook.  ( 4 min )
  • Open

    Benchmarking Bit Errors in Quantized Neural Networks with PyTorch
    Similar to my article series on adversarial robustness, I was planning to have a series on bit errors robustness accompanied by PyTorch code. Instead, due to time constraints, I decided to condense the information into a single article. The code for the originally planned six articles is available on GitHub. The post Benchmarking Bit Errors in Quantized Neural Networks with PyTorch appeared first on David Stutz.  ( 6 min )
  • Open

    Rethinking the Role of PPO in RLHF
    Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in the form of comparisons, and the RL fine-tuning phase, which optimizes a single, non-comparative reward. What if we performed RL in a comparative way? Figure 1: This diagram illustrates the difference between reinforcement learning from absolute feedback and relative feedback. By incorporating a new component - pairwise policy gradient, we can unify the reward modeling stage and RL stage, enabling direct updates based on pairwise responses. Large Language Models (LLMs) have powered increasingly capable virtual assistants, such as GPT-4, Claude-2, Bard and Bing Chat. These systems can respond to complex user queries, write code, and even produce poetry. T…  ( 6 min )

  • Open

    Johnson circle theorem
    Draw three circles of radius r that intersect at a single point. Then draw a triangle connecting the remaining three points of intersection. (Each pair of circles intersects in two points, one of which is the point where all three circles intersect, so there are three other intersection points.) Then the circumcircle of the triangle, […] Johnson circle theorem first appeared on John D. Cook.  ( 5 min )
  • Open

    NVIDIA Blackwell B100 GPUs To Feature SK Hynix HBM3e Memory, Launches In Q2 2024 Due To Rise In AI Demand
    submitted by /u/norcalnatv [link] [comments]
    Researchers propose GameGPT: A multi-agent approach to fully automated game development
    Game dev is super complex nowadays - games have huge codebases, massive teams, and dev cycles dragging on for years. Costs are insane too - budgets can hit $100M+ easily. In a new paper, researchers propose to reverse this trend with an AI framework called GameGPT that automates parts of the dev process using multiple AI agents. Each agent handles a different role (all are fine-tuned from relevant base models): One agent reviews the game design plan to catch errors Another turns tasks into code implementations Reviewer agents check the code and results A testing agent validates everything works as expected By breaking up the workflow, GameGPT can simplify things for the AI agents. They just focus on a narrow role versus having one jack-of-all-trades agent. The authors argue GameGPT can eliminate repetitive and rote elements of gamedev like testing. This would free up developers to focus on creative design challenges. However, the GameGPT paper does not include any concrete results or experiments demonstrating improved performance. There is no evidence presented that GameGPT reduces hallucinations, redundancy or development time. The authors mention empirical results support their claims that the architecture is more effective, but none are provided. I could not find any additional support material about this work, like a project website, that I could use to further check into this (maybe someone can share in the comments?). Right now GameGPT seems mostly conceptual. The ideas are interesting but hard to assess without quantitative results. TLDR: New GameGPT AI framework aims to automate tedious parts of game development using specialized agents. No concrete results were provided in the paper - someone will need to test this out and report back. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Speech Condenser: An Advanced On-Premise Pipeline Tool for Streamlining and Summarizing Dialogues from Videos
    submitted by /u/nez_har [link] [comments]
    Best way to produce consistent images?
    Hi! I'm trying to jazz up my design portfolio for applying for jobs, and I wanted to insert some cute illustrations on each project page. The projects deal with a variety of topics so I'll need pictures of many things, but want to keep the style quite consistent. What is the best AI tool right now to do this? I paid for Midjourney but I can't seem to understand how to get it to do this. For example I got this image from DALLE and love the style, the white background also helps make it look better on the portfolio. I'd want another image in the same style of two kids throwing a ball, but can't figure out how to do it. Alternatively if I could upload this image to an AI and say "in the same style, generate..." that would be great too. Thank you! submitted by /u/_Dip_ [link] [comments]
    Messi vs Ronaldo | Freestyle Rap Song | AI Rap Song | Tell your opinion on this video
    submitted by /u/Agitated-Spell3979 [link] [comments]
    Seeking Your Feedback on a new community around Open-Source AI Code Generation Models
    Currently, we are building a community that is specifically dedicated to Open-Source AI Code Generation Models. Our aim is to create a thriving ecosystem where developers, enthusiasts, and experts can come together to drive innovation, share insights, and promote a collaborative approach to AI code generation. I wanted to provide you with an overview of the key features we're integrating into this community: 1. Collaboration: A dedicated space where enthusiasts and experts alike can collaborate on projects, share their findings, and work on enhancing existing models. 2. Discussion: Whether through forums or chat platforms, we aim to foster discussions around the challenges, breakthroughs, and best practices in the realm of AI code generation. 3. Resource Sharing: Our community will feature a repository/platform for members to freely share and access open-source models, datasets, and other essential tools. With your experience and insight into the AI domain, we would greatly appreciate your feedback on the following:- - Do you believe such a community would be valuable to you personally or to the wider developer community? - Would you consider becoming a part of such a community? - You are already a part of such a community and this one might not be of much value to you? - Any other suggestions or feedback? Your candid feedback on this idea, its potential impact, and any suggestions you might have will be invaluable to us as we continue shaping this community's structure and offerings. submitted by /u/akanshtyagi [link] [comments]
    Biden eyes adding AI chip curbs to Chinese companies abroad
    The Biden administration is considering closing a loophole that gives Chinese companies access to American artificial intelligence (AI) chips through units located overseas. The United States previously restricted shipments of AI chips to China but left overseas subsidiaries of Chinese companies with unfettered access. The Biden administration is now looking for ways to close this loophole and prevent China from accessing top AI technology. However, it is challenging to plug every gap in export controls. Chinese firms are purchasing chips for use in data centers abroad, and it is difficult for the United States to police those transactions. The United States has been seeking to halt the rise of China's AI capability, which depends on its access to U.S. chips. Washington has been working to close other loopholes that allow AI chips into China, and the new rules expected this month will likely apply those same restrictions more broadly to all companies in the market. The U.S. government is also grappling with the issue of Chinese parties accessing U.S. cloud providers like Amazon Web Services. Overall, the Biden administration is facing challenges in cutting China off from top AI technology and closing all loopholes in export controls. Source : https://www.reuters.com/technology/biden-eyes-adding-ai-chip-curbs-chinese-companies-abroad-2023-10-13/ submitted by /u/NuseAI [link] [comments]
  • Open

    SOTA Facial Recognition [D]
    I want to sort folders of pictures of people that are similar to an input photo by similarity. I managed to use DeepFace but I'm wondering if anyone knows a better method? ​ submitted by /u/RedditAlreaddit [link] [comments]  ( 9 min )
    [R] Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
    Paper: https://arxiv.org/abs/2309.17382 Project page: https://agentification.github.io/RAFA Code: https://github.com/agentification/RAFA_code Reason for future, act for now (RAFA) TL;DR: - The first autonomous LLM agent RAFA with provable regret guarantees and outstanding empirical performances. - SOTA results on Game of 24, ALFWorld, BlocksWorld, and Tic-Tac-Toe. submitted by /u/WolverineUnable5957 [link] [comments]  ( 9 min )
    [P] Machine Learning Algorithm from Scratch
    submitted by /u/shaongit [link] [comments]  ( 8 min )
    [D]Was any further work done on the paper "Large-Scale Study of Curiosity-Driven Learning" in recent years?
    So, a few weeks ago, I got interested in the exploration problem in Reinforcement Learning and came across this amazing paper. Just wanted to know if any of you came across any paper which explores this idea more or takes it forward. Thanks in advance. submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 9 min )
    [R] tool to brainstorm novel ideas
    Hey folks, I developed a research tool https://idea-factory.ngrok.dev/ (Login: [temp@holistic-intelligence.net](mailto:temp@holistic-intelligence.net) Password: noidea) to identify novel research problems grounded in the scientific literature. Given an idea that intrigues you, the tool identifies the most relevant pieces of literature, creates a brief summary, and provides three possible extensions of your idea. I would be happy to get your feedback on the usefulness of them. Thank you in advance! submitted by /u/Ma7dy [link] [comments]  ( 9 min )
    [P] Oddly Satisfying Animation of Pixel Shuffle
    submitted by /u/Animated-AI [link] [comments]  ( 8 min )
    [D] Pipeline for data processing in time series forecasting?
    What is the correct pipeline for data processing when conducting time series forecasting? Should we begin with data normalization/standardization, followed by feature selection, and then split the data into training, validation, and test sets? Or is it advisable to initially split the data to prevent spill-over effects? I'm concerned about the possibility of training my model on (part of) the test data, which could result in spill-over effects. However, if the recommended approach is to split the data first and then perform normalization and feature selection, what impact would this have on the selected features? Does the manner in which we split the data into random time periods matter, or is it necessary to incorporate a validation method that accounts for temporal effects? I'm worried that the selected features might depend on the time period I choose for my training and test sets. What is the best practice in this scenario? submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    [R] Researchers propose GameGPT: A multi-agent approach to fully automated game development
    Game dev is super complex nowadays - games have huge codebases, massive teams, and dev cycles dragging on for years. Costs are insane too - budgets can hit $100M+ easily. In a new paper, researchers propose to reverse this trend with an AI framework called GameGPT that automates parts of the dev process using multiple AI agents. Each agent handles a different role (all are fine-tuned from relevant base models): One agent reviews the game design plan to catch errors Another turns tasks into code implementations Reviewer agents check the code and results A testing agent validates everything works as expected By breaking up the workflow, GameGPT can simplify things for the AI agents. They just focus on a narrow role versus having one jack-of-all-trades agent. The authors argue GameGPT can eliminate repetitive and rote elements of gamedev like testing. This would free up developers to focus on creative design challenges. However, the GameGPT paper does not include any concrete results or experiments demonstrating improved performance. There is no evidence presented that GameGPT reduces hallucinations, redundancy or development time. The authors mention empirical results support their claims that the architecture is more effective, but none are provided. I could not find any additional support material about this work, like a project website, that I could use to further check into this (maybe someone can share in the comments?). Right now GameGPT seems mostly conceptual. The ideas are interesting but hard to assess without quantitative results. TLDR: New GameGPT AI framework aims to automate tedious parts of game development using specialized agents. No concrete results were provided in the paper - someone will need to test this out and report back. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Generate audio samples based on promp sample
    hi, I would like to create a system that generate different audio samples, based on an audio sample prompt. Does anyone know whether such a project or similar ideas have been already implemented? Or any suggestion on what to read in order to realize such a project? I have knowledge in ML programming and python audio generation. submitted by /u/busconw [link] [comments]  ( 9 min )
    [D] Getting bad MFUs, what can I do to make it better
    Hi, so I've been working with NanoGPT, finetuning GPT-2, and I'm getting terrible MFUs, with 5 warmup steps at -100% and normal steps have an MFU of around 3-4%. Most runs I hear of have an MFU at around 45%? How do get this better? Colab -> https://colab.research.google.com/drive/1gvTsyjxHiDkKHFsnWWouzr1xJWW23BA3?usp=sharing Code -> https://github.com/VatsaDev/NanoPhi2 submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [D] Check out my latest article on how the new improvements in GPT-4V(ision) can bring on a new ear of computer vision models, fine-tuned on outputs of GPT-4V(vision).
    https://medium.com/@rishiswethan.c.r/how-gpt-4v-ision-will-revolutionise-image-annotation-b0d3ace64bff?source=friends_link&sk=4be42541a8a8ee40e18ef14533342cfd submitted by /u/Remarkable_Seesaw_89 [link] [comments]  ( 8 min )
    How to object detection in Unity any good resources [D]
    I have tired barracuda, vuforia and it doesn’t work for some reason. And completely lost atm. It’s an object detection model to detect the circuit schematic symbols using computer vision submitted by /u/PreferenceFrosty2958 [link] [comments]  ( 9 min )
    [D] Running Large Language Models on CPU
    Fine-tuning large language models with the aim of obtaining a small but accurate model is extremely difficult. This is because you have to strike a balance between the model’s size and accuracy. Researchers from IST Austria & Neural Magic seem to have found a sweet spot. In their latest paper, they successfully applied sparse fine-tuning on MPT with remarkable performance. The MPT model was pruned to 75% without a drop in accuracy, showing performance that is on-par with quantization approaches. Particularly, the resulting sparse model can execute fast on CPUs by taking advantage of the sparsity. Instead of performing standard loss-based fine-tuning which may fail to recover accuracy, the researchers experiment with distillation-type losses. These losses are better at recovering accuracy at high sparsity. What’s impressive is that the sparse fine-tune LLM can achieve 7.7 tokens per second on a single core and 26.7 tokens per second on 4 cores of an cheap consumer AMD Ryzen CPU. The MPT-7B model was fine-tuned via SFT obtaining a dense baseline that showed remarkable performance. This baseline was later pruned with SparseGPT to 40% to 80% reaching 5X compression ratios. By applying SquareHead KD, FP32 models with 75% can be obtained with NO accuracy loss, outperforming cross-entropy and other KD methods. The paper is available on Arxiv. Sparse Finetuning for Inference Acceleration of Large Language Models: https://huggingface.co/papers/2310.06927 MPT Sparse Finetuned on GSM8k with DeepSparse Hugging Face Space: https://huggingface.co/spaces/neuralmagic/sparse-mpt-7b-gsm8k submitted by /u/mwitiderrick [link] [comments]  ( 9 min )
    [P] I built an AI Writing Coach to proofread your work
    submitted by /u/hungryillini [link] [comments]  ( 8 min )
    [R] Conceptual Framework for Autonomous Cognitive Entities - Clemson University 2023 - Introducing the ACE Framework
    Paper: https://arxiv.org/abs/2310.06775 GitHub: https://github.com/daveshap/ACE_Framework Blog post: https://medium.com/@dave-shap/autonomous-agents-are-here-introducing-the-ace-framework-a180af15d57c Abstract: The rapid development and adoption of Generative AI (GAI) technology in the form of chatbots such as ChatGPT and Claude has greatly increased interest in agentic machines. This paper introduces the Autonomous Cognitive Entity (ACE) model, a novel framework for a cognitive architecture, enabling machines and software agents to operate more independently. Drawing inspiration from the OSI model, the ACE framework presents layers of abstraction to conceptualize artificial cognitive architectures. The model is designed to harness the capabilities of the latest generative AI technologies, including large language models (LLMs) and multimodal generative models (MMMs), to build autonomous, agentic systems. The ACE framework comprises six layers: the Aspirational Layer, Global Strategy, Agent Model, Executive Function, Cognitive Control, and Task Prosecution. Each layer plays a distinct role, ranging from setting the moral compass and strategic thinking to task selection and execution. The ACE framework also incorporates mechanisms for handling failures and adapting actions, thereby enhancing the robustness and flexibility of autonomous agents. This paper introduces the conceptual framework and proposes implementation strategies that have been tested and observed in industry. The goal of this paper is to formalize this framework so as to be more accessible. ​ https://preview.redd.it/7scnwk5a5dub1.png?width=850&format=png&auto=webp&s=371b5b02a453dcad3e70a2600cc2d625eda44133 ​ submitted by /u/Prior-Travel3670 [link] [comments]  ( 9 min )
    [D] Fine tune Llama2 with Lora for foreign language
    Hey folks, I watched a YouTube video, about how some LLMs tokenise languages other than English. For example for the Greek language you will see that this is failing totally, as one character is one token always: ​ https://preview.redd.it/835p97cyhcub1.png?width=1900&format=png&auto=webp&s=944b150cc0fc112cb8cd2bac600f6fcdcc85fb1e My question is, if I would fine-tune it with Alpaca Lora based on Greek text, would the tokeniser change and work properly? Or the fine tune would not work as the tokeniser cannot be retrained/tuned? submitted by /u/kostakos14 [link] [comments]  ( 9 min )
    [D] Advice for applying to undergraduate research internships?
    Hello, I’m a 3rd year data science and linguistics major at a top 30 school looking to land an internship at industry research. I’d say I’m fairly competitive. Extensive research experience. 2nd author at EMNLP, and did an REU at a prestigious institute. I’m already looking at some places such as AI2, but I’m curious if there are other internships I should be aware of. submitted by /u/Kai_151 [link] [comments]  ( 9 min )
    [D] The history of neural network is over. J. Schimdhuber proposes a giant network that includes all future neural network architecture as a subcomponent.
    submitted by /u/fromnighttilldawn [link] [comments]  ( 8 min )
    [P] How do I make my CNN more efficient?
    I've been trying a variety of pre-constructed and self-made U-net-like CNNs. Had a few questions: When using torch summary, is there a general formula for estimating a model's inference time/backprop time and required GPU ram based on the information torch summary gives ( Total params, Trainable params, Non-trainable params, Total mult-adds (G), Input size, Forward/backward pass size (MB), Params size (MB), Estimated Total Size (MB)), and other hyper-parameters such as batch size? Why is my self-made model (which has smaller quantities in all the parameters torch summary outputs) requiring more GPU ram AND taking more time for inference and backprop? Is the coding style for the model's class and its forward prop a huge factor here? If so, could you please provide tips for making my code more efficient? Here's the notebook showcasing a pre-made model from MONAI and two of my self-made models: https://colab.research.google.com/drive/1VRrdnzaAbp25_DtaWTKHW5JxzyhmueMC?usp=sharing I've also listed some of my observation on the models and their results in the notebook. Any ideas or suggestion would be much appreciated. submitted by /u/mimivirus2 [link] [comments]  ( 9 min )
    [P] Made a Python package for creating API endpoints with dynamic queries.
    submitted by /u/squirrels-api [link] [comments]  ( 9 min )
    [R] Supercharging reinforcement learning with logic
    Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation. ​ https://preview.redd.it/kdhpu9qraaub1.png?width=1786&format=png&auto=webp&s=8155ba38fc66bd3a2fe934b1f395351c4db68e2f We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2). ​ https://preview.…  ( 9 min )
    [D] transformers vs llama.cpp vs GPTQ vs GGML vs GGUF
    i am a little puzzled, i know that transformers is the HF framework/library to load infere and train models easily and that llama.cpp is another framework/library that does the more of the same but specialized in models that runs on CPU and quanitized and run much faster i understand that GGML is a file format for saving model parameters in a single file, that its an old problematic format, and GGUF is the new kid on the block, and GPTQ is the same quanitized file format for models that runs on GPU ​ so here is what i can't understand (assuming i got all the rest correct): does HF Transformers support loading GGUF or GGML models ? and does GGUF needs a tokenizer json or does the data comes from within the gguf file itself and is safetensors (another file format) supported by both Transformers and Llama.cpp ​ since i cannot find python examples for these combination i assume all the answers are - No ​ can anyone shed some light ? submitted by /u/Particular_Flower_12 [link] [comments]  ( 9 min )
  • Open

    Hi everyone , I was following an online RL tutorial that uses Stable baselines3 and Open AI's gym to implement a Cart Pole environment but I have ran into some problems. Can anyone of you please help me?
    I was following Nicholas Renotte's RL in 3 hours tutorial and I ran into this issue at time stamp 1:10:00 while testing my trained Agent. ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part. This is my code for testing my environment: episodes=5 for episode in range(1,episodes+1): obs=env.reset() done=False score=0 print(obs) while not done: env.render() action, _ = model.predict(obs) #Now using model here obs, reward, done, truncated, info = env.step(action) score += reward print('Episode:{} Score:{}'.format(episode,score)) env.close() And this is the environment I am using : environment_name = 'CartPole-v1' env=gym.make(environment_name,render_mode="human") the model variable has my trained model stored in it and is initialized as such : model =PPO.load(PPO_Path, env=env) The print(obs) function returns this value : (array([ 0.03954345, -0.04975226, -0.02942382, -0.02261402], dtype=float32), {}) I am running this code in a Notebook on VS code on an M2 Macbook running MacOS 13.5, I am using Python 3.9.15 and the latest version of all the other libraries and dependencies. Please help submitted by /u/Straight-Knowledge83 [link] [comments]  ( 9 min )
    Reinforcement Learning Platform for UAVs
    I'm doing a project that aims to use reinforcement learning (PPO variations) with UAVs. What are the most up to date tools are for implementing and trying new RL algorithms in this space? I've looked at AirSim, and it seems to no longer be supported by Micrsosoft. I've also been heavily looking at Flightmare, which is almost exactly what I want, but getting the tool that hasn't been maintained for years up and running is giving me headaches (and the documentation is not great/up to date either). Ultimately, what I'm looking for is: * Physics simulation * Photo-realistic vision * Built-in integration with Gym would be awesome * Python platform preferred, C++ also ok I've also used ROS/Gazebo with PyTorch previously, and that is my backup plan I suppose, but it's not photo-realistic and is kind of slow in my experience. submitted by /u/zeus_the_transistor [link] [comments]  ( 9 min )
    Training a RL Model with Continuous State & Action Space in a Real-World Scenario
    Hello everyone, I'm a Data Science student diving into an exciting thesis topic: using reinforcement learning to stabilize boats in rough seas by adjusting a keel's angle. But I am a bit concerned about the high complexity of the problem and the given situation: Action Space: Continuous, representing the keel's angle adjustments. State Space: Continuous, capturing the dynamic behavior of the sea, including waves. Training Environment: Currently, the company only has a real-world water tank setup to simulate the sea conditions. There's no computer simulation available. Given this setup, I have a couple of concerns: Is it possible to train an RL model effectively in such a complex real-world scenario without first having a computer simulation? And if yes, what would be your initial steps in doing so? Are there possibilities to reduce the problem's complexity while training exclusively in the real-world water tank simulation? (i.e. transforming the action space into a discrete action space?) Any insights or advice would be greatly appreciated! submitted by /u/No-Wasabi3556 [link] [comments]  ( 9 min )
    Supercharging reinforcement learning with logic
    Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation. ​ https://preview.redd.it/6wmg0qnlaaub1.png?width=1786&format=png&auto=webp&s=01f82cf24de79b317b6f9406b0b6379b949a34d3 We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2). ​ https://preview.redd.it/u8f44fskaaub1.png?width=1636&format=png&auto=webp&s=9509f03a936f41cd0131388564833b86a39c295a However, the benefits of our semantic proxy go well beyond performance. The use of temporal logic programming has two crucial beneficial by-products such as symbolic explainability and modularity. PyReason provides an explainable symbolic trace that captures the evolution of the environment in a precise manner while modularity allows us to add or remove aspects of the logic program – allowing for adjustments to the simulation based on a library of behaviors. PyReason is well-suited to model simulated environments for other reasons – namely the ability to directly capture non-Markovian relationships and the open-world nature (defaults are “uncertain” instead of true or false). We have demonstrated that agents can be trained using standard RL techniques such as DQN using this framework. Preprint: https://arxiv.org/abs/2310.06835 Video: https://youtu.be/9e6ZHJEJzgw Code for PyReason-as-a-Sim (integration with DQN): https://github.com/lab-v2/pyreason-rl-sim Code for PyReason Gym: https://github.com/lab-v2/pyreason-gym PyReason Home: neurosymbolic.asu.edu/pyreason/ ​ submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    Actor-critic on piecewise constant reward function
    I made a environment with piece wise constant reward function for testing the network architecture. And its episode length is 1. The critic will try to learn this and become a piecewise constant function. And have a gradient close to 0 making the gradient vanish for the policy. I can think of some solutions: - Change the reward function to a dense reward But i wanted some other views; has anyone solved such problems? submitted by /u/Automatic-Web8429 [link] [comments]
    Help understanding the PETS algorithm
    I am trying to read this paper and I am unable to get the big picture over here. Can someone please explain what's going on in the Propagation and Planning stage? In the Model stage, I understand that they are using a Probabilistic Model to handle uncertainty. ​ https://preview.redd.it/idenqd492aub1.png?width=945&format=png&auto=webp&s=40da9bf53b21dbed63b70571f3833b0fe3a9dabb For instance, what does Particle mean in this paper? This big picture here is that I am trying to understand the Model Based Policy Optimization paper and it seemed like they built upon the above paper. submitted by /u/Academic-Rent7800 [link] [comments]
  • Open

    Supercharging reinforcement learning with logic
    Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation. ​ https://preview.redd.it/pmukb2k7aaub1.png?width=1786&format=png&auto=webp&s=3fb36d0fbeb75393ae8f71f8f369ff5e0b79fbcb We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2). ​ https://preview.…

  • Open

    [D] Detect anomaly with small dataset
    Hi guys, I'm hoping for advice on the direction to detect detect pattern/ anomaly at small scale. I understand there are certain tools out there for webpage monitoring, but let's say this is just an example that I'm ingesting small amount of hourly/daily traffic to a sub webpage on my site (anywhere from 50-100 visits per day, this may mean max ~30 visits/per hour) There are times when traffic to the page drops as the page doesn't fully load , or the other page on which I'm hosting the link to this page doesn't load resulting in people can't see the link tothis sub page). Giving the scope/scale of this, amount of the data, it's not possible for me to use other solutions for anomaly detection (those that costs like $100-$1000+/month) and I'm not sure where to start with ML with this minimal amount of hourly/daily data to monitor. Is there anything that I should look into? Thank you submitted by /u/duyth [link] [comments]  ( 9 min )
    [D] Foundational must reads for LLMs
    Came across this post https://community.openai.com/t/foundational-must-read-gpt-llm-papers/197003 As I am new to LLM's , Please share your thoughts on how to start and what subtopics to learn in depth ? ​ submitted by /u/Electrical_Study_617 [link] [comments]  ( 8 min )
    [D] Google AutoML Alternatives?
    Having jumped into AI this last year I've used Google AutoML a lot and it's honestly worked great. I primarily use it for text classification. Training usually takes anywhere from 4-8 hours. The results have been above 90% accurate on interference. Now, the problem. Cost. It's super expensive to run an endpoint for predictions with Google AutoML, for text classification. I'm wondering if anyone has any alternatives or ideas for similar results for cheaper. I am ok waiting for prediction results a bit as I don't need sub 1ms type responses lol. But everything I've tried has yielded less then optimal results. Tried various hugging face models, and accuracy is about 50%. submitted by /u/zepaz [link] [comments]  ( 9 min )
    [R] Octopus: Embodied Vision-Language Programmer from Environmental Feedback - Nanyang Technological University 2023 - Continually refines its understanding and execution, demonstrating impressive adaptability!
    Paper: https://arxiv.org/abs/2310.08588 Blog: https://choiszt.github.io/Octopus/ Github: https://github.com/dongyh20/Octopus Youtube short: https://www.youtube.com/watch?v=lHbTvB0yIP4 Abstract: Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied agent, it signifies a crucial stride towards the creation of autonomous and context-aware systems capable of formulating plans and executing commands with precision. In this paper, we introduce Octopus, a novel VLM designed to proficiently decipher an agent's vision and textual task objectives and to formulate intricate action sequences and generate executable code. Our design allows the agent to adeptly handle a wide spectrum of tasks, ranging from mundane daily chores in simulators to sophisticated interactions in complex video games. Octopus is trained by leveraging GPT-4 to control an explorative agent to generate training data, i.e., action blueprints and the corresponding executable code, within our experimental environment called OctoVerse. We also collect the feedback that allows the enhanced training scheme of Reinforcement Learning with Environmental Feedback (RLEF). Through a series of experiments, we illuminate Octopus's functionality and present compelling results, and the proposed RLEF turns out to refine the agent's decision-making. By open-sourcing our model architecture, simulator, and dataset, we aspire to ignite further innovation and foster collaborative applications within the broader embodied AI community. https://preview.redd.it/1zn9q3g7a8ub1.jpg?width=1651&format=pjpg&auto=webp&s=3b14f862b24784918d6b4514bf575cf29bc65edf https://preview.redd.it/sv2y06g7a8ub1.jpg?width=1079&format=pjpg&auto=webp&s=be9ab7dd7cf23018b6d1fa0c584ad301b04c8abf https://preview.redd.it/350xc6g7a8ub1.jpg?width=942&format=pjpg&auto=webp&s=53e57541d35ca23d06b8c5be71c2b0c1910fdf90 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] My article about autonomous LLMs-based agents: Chain of Thought, Plan and Solve, Self-Ask, ReAct, Reflexion, Self-Consistency, ToT, and GoT; and intrinsic insights behind an autonomous LLMs-based agents.
    A Complete Guide to LLMs-based Autonomous Agents (Part I): https://medium.com/p/69515c016792 My article offers a comprehensive overview of LLM-based agents, covering Chain of Thought, Plan and Solve/Execute, Self-Ask, ReAct, Reflexion, Self-Consistency, Tree of Thoughts, and Graph of Thoughts. It traces their evolution from basic forms, driven primarily by prompt engineering, to advanced models that emulate human problem-solving intricacies. Moreover, it provides an engineer's insights into the architecture behind these autonomous agents. Naturally suitable for AI agent: LLMs feature a natural language interface tailored for user-computer interactions and they come equipped with innate reasoning abilities. LLM's Deficiency: Despite its strengths, GPT-4 can provide incorrect answers or hallucinations for complex tasks. Challenges with Training: Finetuning pretrained LLMs doesn't enhance reasoning capabilities. While creating a larger LLM can bolster its problem-solving skills, the process can span several months to a year, potentially leading to a two-year wait before its official launch. Closed Model and RAG: LLMs, once trained, are unable to fetch real-time data and have inherent shortcomings. However, for Q&A tasks, leveraging an open-book method proves more effective. The aim is not to have an all-knowing model but one skilled in reasoning and utilizing tools. LLM Agent Approach: We direct LLMs to break down intricate tasks, tackle individual sub-tasks, evaluate them, and make revisions of the strategy as needed. submitted by /u/Appropriate-Map-9923 [link] [comments]  ( 9 min )
    [D] Have a research paper to do for my masters in Big Data Analytics. Wanted to do something with ML. Just look for some advice.
    In my last semester and we have to pick a topic related to big data analytics. Right now I have to prepare a proposal for my topic. My topic will have to do with something to ML and the medicinal field. Current plan: Get a dataset related to my topic. Right now its Parkinson's disease. My question is, for the dataset would I need a dataset with text data or would images of scans of the brain be better for detecting say early detection be better? I cant figure out which would be the better dataset. Get the dataset and then use Azure machine learning to prepare my dataset and do some data cleaning and handling and then get a model out of it. I picked azure because I have azure license from my uni and after searching about, I read about the azure machine learning service. Would azure be a good choice for training my model on this task? I've mostly used google colab for training small models. Once the model is trained and setup. I want to setup a front end web app (flask) and then setup my model so that users can upload either text data or image scans and then model would output results regarding the inputted data. My question is, would it be ideal to have the model located on my local machine or would azure let me do api calls between my local to the azure trained model? Would all this be feasible to do? I'm not looking to develop a full fledge application, just want to create a model with a dataset of images or text and then be able to feed new images to the trained model and get an output. Just looking for opinions or advice on this topic. Thanks. submitted by /u/Jesustakethewheeeeel [link] [comments]  ( 9 min )
    [D] Is the topic of your ML PhD important?
    I read the previous discussion on whether a PhD is required in the field, and I had a follow-up question: does the topic of your PhD matter? So let’s say you finish a PhD in the field of medical machine learning (non-CV), would an automotive company, FAANG, or e.g. DeepMind still like to hire you once you would like to switch your sub-field a bit? Or are you simply less desirable than a candidate without a PhD but more experience in CV? I am asking this because I would like to stay flexible as I have many ML sub-fields I want to work in, and I do not want to limit my options by pursuing a PhD in a topic that I don’t want work in for my entire life. For context, I do already have 2 years working experience as an AI engineer and I am finishing my AI master’s. submitted by /u/Otoz123 [link] [comments]  ( 9 min )
    [D] Fine-Tuning tortoise tts
    I'm planning on creating my own AI voice to use with ChatGPT. I have done my research, and there are two ways to achieve a quality TTS model to use. I have tried them both. I fine-tuned tortoise tts on my own 20-minute dataset. I have also tried to create a model using Tacotron2 and the dataset. The quality of the fine-tuned model is better. But one downside is that I still have to give the fine-tuned tortoise model a reference voice for it to choose the voice that I fine-tuned it with. On the other hand, the trained model didn't need to. The question here is: why didn't the tortoise model choose the voice in the dataset as the default? Do I need to expand my dataset for it to be chosen as the main voice? ​ Thank all. submitted by /u/Capital_Birthday_654 [link] [comments]  ( 9 min )
    [R] Do pretrained Transformers Really Learn In-context by Gradient Descent?
    Do pretrained Transformers Really Learn In-context by Gradient Descent? https://x.com/Shadowkiller331/status/1713003711629516862?s=20 ​ https://preview.redd.it/zpwkh47hm7ub1.png?width=450&format=png&auto=webp&s=6def807c9c9f605e3f7839159db3402d837f6895 submitted by /u/Educational-Newt2052 [link] [comments]  ( 8 min )
    [R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning
    submitted by /u/markurtz [link] [comments]  ( 8 min )
    A[R]xiv [D]ives - Llama 2 Deep Dive
    We’ve been diving deep into foundational papers on Fridays as a group. It’s been helpful for us to get into the nitty gritty details of these papers, so hope you find it helpful too. Would love to have anyone join the discussion next week! submitted by /u/FallMindless3563 [link] [comments]  ( 9 min )
    [N] Most detailed human brain map ever contains 3,300 cell types
    What can this mean to artificial neural networks? submitted by /u/hhh888hhhh [link] [comments]  ( 8 min )
    [D] My fine tune behaves like the base model
    Hi all, I did a fine tune of CodeLlama-7b on a custom dataset and I was getting very excited because it was doing very well on evals. I saved the model with model.save() and model.push_to_hub() and it seemed to work. When I load the model it shows the structure with the Lora_A and Lora_B for every layer, but it now acts like the base model with no changes. Is it possible I saved wrong or likely that I am loading wrong? Any help is greatly appreciated! submitted by /u/cstein123 [link] [comments]  ( 9 min )
    [R] Machine Learning Courses Mega Bundle from Mammoth Interactive
    submitted by /u/brand_momentum [link] [comments]  ( 8 min )
    [R] best RL algorithm for a single turn game ?
    Hi there, I'm new to Reinforcement Learning (RL), and the papers I've come across mainly focus on scenarios where states change with choices in a game. However, I'm interested in finding the best RL algorithm for a simpler case. I have an input I and a policy P. P outputs probabilities for available choices (a limited set of integers), and a reward r is given for each choice (the reward is costly to compute that’s why I use RL). The goal is to train P to maximize the reward. So as if we are in a game that ends after only one choice. Any recommendations for the best RL algorithm in this case? Thanks! submitted by /u/Meddhouib10 [link] [comments]  ( 9 min )
    [D] Ways to get research experience before grad school
    I recently graduated with my bachelor's from a low ranked school with a good GPA. I was planning on starting a PhD studying NLP and applied to 12 mid level schools. However, I was unfortunately rejected from all the schools I applied to. I suspect it was likely due to my lack of experience in NLP research as my school didn't have any professors who do research in that area. My current plan is to work in industry for the next two years and try and do some NLP research on the side before reapplying. Do any NLP labs allow for external volunteer researchers? Besides that, are there any other ways to get research experience? submitted by /u/Bananas970 [link] [comments]  ( 9 min )
    [D] Validation loss is decreasing but WER is increasing in Whisper model training.
    Hi, I've been using the Huggingface library to fine-tune the Whisper model. While the WER was initially decreasing, I've noticed it began to rise even though the validation loss continues to drop. Could the issue be related to my testing on a very small dataset? As shown in the image, after 80th step the wer suddenly started increasing from 13 -> 28 https://preview.redd.it/xq2bm0oyh5ub1.png?width=838&format=png&auto=webp&s=136447f527bea6880b46ae588463500304b1d6bb ​ submitted by /u/aadityaura [link] [comments]  ( 9 min )
    Looking for An Easy-To-Use API To Train Image Model [D]
    Yo! I have some images I curated on MJ, I want to run these together into an AI and spit out more outputs like these. The current process has me get maybe .2% successful outputs through MJ I figure the next step to more outputs is training a custom model. What's the easiest way to do this using a web-based API? Does this involve using Stable Diffusion? submitted by /u/AdministrativePie991 [link] [comments]  ( 9 min )
    [D] Time Series Forecasting on positive AND negative Examples
    Hey 😀 not sure if extremely trivial or really tricky. In the end, I want a machine that generates a time series without further input based on training data, generating a new time series every time. I want this to be based on a transformer. I want it trained with data looking like this: 2023-07-03 14:19:48,GOOD 2023-07-04 13:59:07,GOOD 2023-07-05 01:58:54,GOOD 2023-07-05 03:30:05,BAD 2023-07-05 05:17:43,BAD 2023-07-06 05:35:34,GOOD 2023-07-07 14:06:03,GOOD 2023-07-08 21:16:05,BAD with “GOOD” and “BAD” being the state of the system which is likely dependent on the time series data up to that point. I have a lot of data and it’s data points like the one above with maybe a hundred rows of data on average for a few thousand systems. Every system is independent of all others but all are identical. I do not want to train only on “GOOD” as this would leave out a lot of valuable data … Is there a way to train a time series transformer with both data that leads to GOOD as well as BAD outcomes, so it would generate time series from scratch that are unlikely to have BAD outcomes? Thank you!! submitted by /u/_VeniVidiVeni_ [link] [comments]  ( 9 min )
    [P] VGSLify: Transform Your TensorFlow Model Prototyping Experience
    Hey r/MachineLearning! 🚀 Have you ever been frustrated with the lengthy and sometimes cumbersome TensorFlow code for defining models? Or wished you could experiment with different architectures without dealing with copious lines of code? That's where VGSLify steps in. Why Use VGSLify? Compact Definitions: VGSLify leverages VGSL spec, enabling you to express intricate model architectures in a compact and elegant manner. This means you can quickly experiment with different models by simply tweaking a string format, bypassing the verbose code traditionally required. Swift Prototyping: Craft intricate neural network architectures using succinct VGSL spec strings, allowing you to iterate faster and more efficiently. From TensorFlow to VGSL: Got a pre-existing TensorFlow model? Easily co…  ( 9 min )
    [D] How important is a PhD for industry?
    I'm 21 years old and currently pursuing a master's degree in theoretical physics in the UK. I have a strong interest in machine learning and have completed many computing courses as well as independent projects in this field. I'm considering a career in machine learning and I'm curious about the benefits of doing a PhD. I've heard that the salary difference may not be substantial. Could anyone provide insights on how important a PhD is for specific roles in this field? Additionally, what factors should I consider when deciding whether to pursue a PhD in machine learning, apart from my passion for ML? Also are private PhDs common in ML. Working in a company and asked them to pursue a PhD within the company? Thanks :) submitted by /u/Neat-Print2792 [link] [comments]  ( 9 min )
    [D] SHAP mask_token: why does it matter and which one to choose?
    submitted by /u/Being-Nothingness [link] [comments]  ( 9 min )
    [R] Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D] is there a good Code or Text model ?
    i am trying to detect code segments in a text response of an LLM, so i can highlight them using Highlight,JS, ​ is there a good model that can do the classification of a block of text and decide if it is a block of code or a block of NLP simple text (english) ? submitted by /u/Particular_Flower_12 [link] [comments]  ( 9 min )
  • Open

    Looking for developers / future founders who want to build and grow disruptive AI apps.
    I am a multi-time founder myself. I've secured millions from investors for my past startups and had notable success with a video app that gathered 4M users and $300k in revenue. However, due to the intense competition in the video app editing sector, my team and I couldn't turn a profit. After my last startup faltered during the covid period, I transitioned to being a full-time product-market fit and growth marketing consultant and have made really great money doing it. I assist new startups in avoiding the mistakes I made and implement frameworks that significantly increase their chances of success. I've observed that many new founders venture into startups without fully grasping the challenges of building something people genuinely desire. It’s really not easy. How would you know what y…
    AI Images Detectors Are Being Used to Discredit the Real Horrors of War
    A free AI image detector is being used to discredit a photograph of a burnt corpse of a baby killed in Hamas's attack on Israel. However, experts have pointed out that the image does not show any signs of being created by AI. The idea that the image is AI-generated has spread on Twitter, suggesting that official Israeli accounts are spreading AI-generated misinformation. AI image generators have trouble replicating reality accurately, and the shadows in the photograph are consistent with a real image. Multiple AI image detection tools have also determined that the image is not AI-generated. Source : https://www.404media.co/ai-images-detectors-are-being-used-to-discredit-the-real-horrors-of-war/ submitted by /u/NuseAI [link] [comments]
    Seeking a Community for Open-Source AI Code Generation Models
    Hello everyone! 🌟 I hope this post finds you well. I've been delving deeper into the world of AI code generation recently and am curious to discover if there are communities or platforms specifically dedicated to open-source AI code generation models. I'm aware of hugging face but is there any other besides that. Here's what I'm looking for: Collaboration: A space where enthusiasts and experts alike can collaborate on projects, share insights, and improve upon existing models. Discussion: Forums or chat platforms where discussions around the challenges, breakthroughs, and best practices in AI code generation take place. Resource Sharing: A repository or platform where open-source models, datasets, and related tools can be freely shared and accessed. Learning and Tutorials: Any resources that can help newcomers grasp the concepts and intricacies of AI code generation. If you know of any such community or are part of one, please do let me know. submitted by /u/akanshtyagi [link] [comments]
    Mickey, what are you doing?
    submitted by /u/LeviJr00 [link] [comments]
    Updates to my Capstone Project with Enhanced Features and still freely available to all (until OpenAI credits deplete - Free ChatGPT4). Hoping to introduce the community feature too where people can generate STEM animations to aid learning
    submitted by /u/Raymondlkj [link] [comments]
    learning for school with an AI
    does anyone know if there is an AI online where you can import documents and the AI is forming and asking you questions about that topic on the document? like an AI who generates test for you to be prepared for every potential question in a school test. submitted by /u/satanskittenz [link] [comments]
    Creative Question: Your ideas for AI generative reality
    Ok so we have AI generated content, First text, then images, then videos. What will the world look like when we have a generative world? Generative objects, Generative Games, Generative Moods, Generative memories, Generative senses and perceptions, Generative Environments, Generative Reality. Anyone want to talk about what it might look like? ( I would like to hear a unhinged idea for what might happen, Speculative of course ) submitted by /u/rolyataylor2 [link] [comments]
  • Open

    AI’s Kryptonite: Data Quality
    The ability of Generative AI (GenAI) tools to deliver accurate and reliable outputs entirely depends on the accuracy and reliability of the data used to train the Large Language Models (LLMs) that power the GenAI tool. Unfortunately, the Law of GIGO – Garbage In, Garbage Out – threatens the widespread adoption of GenAI.  Whether generating… Read More »AI’s Kryptonite: Data Quality The post AI’s Kryptonite: Data Quality appeared first on Data Science Central.  ( 22 min )
  • Open

    "Pitfalls of learning a reward function online", Armstrong et al 2020 {DM}
    submitted by /u/gwern [link] [comments]
  • Open

    Newton line
    Let Q be a convex quadrilateral with at most two parallel sides. Draw the two diagonals then draw a line through their midpoints. This line is called the Newton line. (The requirement that at most two sides are parallel insures that the midpoints are distinct and so there is a unique line joining them.) In […] Newton line first appeared on John D. Cook.  ( 5 min )
  • Open

    FABind: Fast and Accurate Protein-Ligand Binding. (arXiv:2310.06763v2 [cs.LG] UPDATED)
    Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need for generating multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires external modules for selecting suitable binding pockets, further impacting efficiency. In this work, we propose $\mathbf{FABind}$, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. $\mathbf{FABind}$ incorporates a unique ligand-informed pocket prediction module, which is also leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. Through extensive experiments on benchmark datasets, our proposed $\mathbf{FABind}$ demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods. Our code is available at $\href{https://github.com/QizhiPei/FABind}{Github}$.  ( 2 min )
    On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks. (arXiv:2310.08150v1 [math.ST])
    Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.  ( 2 min )
    GRASP: Accelerating Shortest Path Attacks via Graph Attention. (arXiv:2310.07980v1 [cs.LG])
    Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.  ( 2 min )
    GenTKG: Generative Forecasting on Temporal Knowledge Graph. (arXiv:2310.07793v1 [cs.CL])
    The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework that performs generative forecasting on tKGs named GenTKG, which combines a temporal logical rule-based retrieval strategy and lightweight parameter-efficient instruction tuning. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting under low computation resources. GenTKG also highlights remarkable transferability with exceeding performance on unseen datasets without re-training. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.  ( 2 min )
    Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization. (arXiv:2310.07985v1 [cs.LG])
    Neural combinatorial optimization (NCO) is a promising learning-based approach for solving challenging combinatorial optimization problems without specialized algorithm design by experts. However, most constructive NCO methods cannot solve problems with large-scale instance sizes, which significantly diminishes their usefulness for real-world applications. In this work, we propose a novel Light Encoder and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue. The LEHD model can learn to dynamically capture the relationships between all available nodes of varying sizes, which is beneficial for model generalization to problems of various scales. Moreover, we develop a data-efficient training scheme and a flexible solution construction mechanism for the proposed LEHD model. By training on small-scale problem instances, the LEHD model can generate nearly optimal solutions for the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 1000 nodes, and also generalizes well to solve real-world TSPLib and CVRPLib problems. These results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO. The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.  ( 2 min )
    Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations. (arXiv:2304.04234v2 [cs.LG] UPDATED)
    Neural operators as novel neural architectures for fast approximating solution operators of partial differential equations (PDEs), have shown considerable promise for future scientific computing. However, the mainstream of training neural operators is still data-driven, which needs an expensive ground-truth dataset from various sources (e.g., solving PDEs' samples with the conventional solvers, real-world experiments) in addition to training stage costs. From a computational perspective, marrying operator learning and specific domain knowledge to solve PDEs is an essential step in reducing dataset costs and label-free learning. We propose a novel paradigm that provides a unified framework of training neural operators and solving PDEs with the variational form, which we refer to as the variational operator learning (VOL). Ritz and Galerkin approach with finite element discretization are developed for VOL to achieve matrix-free approximation of system functional and residual, then direct minimization and iterative update are proposed as two optimization strategies for VOL. Various types of experiments based on reasonable benchmarks about variable heat source, Darcy flow, and variable stiffness elasticity are conducted to demonstrate the effectiveness of VOL. With a label-free training set and a 5-label-only shift set, VOL learns solution operators with its test errors decreasing in a power law with respect to the amount of unlabeled data. To the best of the authors' knowledge, this is the first study that integrates the perspectives of the weak form and efficient iterative methods for solving sparse linear systems into the end-to-end operator learning task.  ( 3 min )
    Diffusion-based Generative AI for Exploring Transition States from 2D Molecular Graphs. (arXiv:2304.12233v3 [physics.chem-ph] UPDATED)
    The exploration of transition state (TS) geometries is crucial for elucidating chemical reaction mechanisms and modeling their kinetics. Recently, machine learning (ML) models have shown remarkable performance for prediction of TS geometries. However, they require 3D conformations of reactants and products often with their appropriate orientations as input, which demands substantial efforts and computational cost. Here, we propose a generative approach based on the stochastic diffusion method, namely TSDiff, for prediction of TS geometries just from 2D molecular graphs. TSDiff outperformed the existing ML models with 3D geometries in terms of both accuracy and efficiency. Moreover, it enables to sample various TS conformations, because it learned the distribution of TS geometries for diverse reactions in training. Thus, TSDiff was able to find more favorable reaction pathways with lower barrier heights than those in the reference database. These results demonstrate that TSDiff shows promising potential for an efficient and reliable TS exploration.  ( 2 min )
    LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language Models. (arXiv:2304.00457v3 [cs.CL] UPDATED)
    Large Language Models (LLMs) have revolutionized natural language processing and demonstrated impressive capabilities in various tasks. Unfortunately, they are prone to hallucinations, where the model exposes incorrect or false information in its responses, which renders diligent evaluation approaches mandatory. While LLM performance in specific knowledge fields is often evaluated based on question and answer (Q&A) datasets, such evaluations usually report only a single accuracy number for the dataset, which often covers an entire field. This field-based evaluation, is problematic with respect to transparency and model improvement. A stratified evaluation could instead reveal subfields, where hallucinations are more likely to occur and thus help to better assess LLMs' risks and guide their further development. To support such stratified evaluations, we propose LLMMaps as a novel visualization technique that enables users to evaluate LLMs' performance with respect to Q&A datasets. LLMMaps provide detailed insights into LLMs' knowledge capabilities in different subfields, by transforming Q&A datasets as well as LLM responses into an internal knowledge structure. An extension for comparative visualization furthermore, allows for the detailed comparison of multiple LLMs. To assess LLMMaps we use them to conduct a comparative analysis of several state-of-the-art LLMs, such as BLOOM, GPT-2, GPT-3, ChatGPT and LLaMa-13B, as well as two qualitative user evaluations. All necessary source code and data for generating LLMMaps to be used in scientific publications and elsewhere is available on GitHub: https://github.com/viscom-ulm/LLMMaps  ( 3 min )
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v9 [stat.ML] UPDATED)
    This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.  ( 3 min )
    Conditional Mutual Information for Disentangled Representations in Reinforcement Learning. (arXiv:2305.14133v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.  ( 2 min )
    Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach. (arXiv:2309.00848v2 [cs.CV] UPDATED)
    This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.
    Flood and Echo: Algorithmic Alignment of GNNs with Distributed Computing. (arXiv:2310.06970v2 [cs.LG] UPDATED)
    Graph Neural Networks are a natural fit for learning algorithms. They can directly represent tasks through an abstract but versatile graph structure and handle inputs of different sizes. This opens up the possibility for scaling and extrapolation to larger graphs, one of the most important advantages of an algorithm. However, this raises two core questions i) How can we enable nodes to gather the required information in a given graph ($\textit{information exchange}$), even if is far away and ii) How can we design an execution framework which enables this information exchange for extrapolation to larger graph sizes ($\textit{algorithmic alignment for extrapolation}$). We propose a new execution framework that is inspired by the design principles of distributed algorithms: Flood and Echo Net. It propagates messages through the entire graph in a wave like activation pattern, which naturally generalizes to larger instances. Through its sparse but parallel activations it is provably more efficient in terms of message complexity. We study the proposed model and provide both empirical evidence and theoretical insights in terms of its expressiveness, efficiency, information exchange and ability to extrapolate.
    WiGenAI: The Symphony of Wireless and Generative AI via Diffusion Models. (arXiv:2310.07312v2 [cs.IT] UPDATED)
    Innovative foundation models, such as GPT-3 and stable diffusion models, have made a paradigm shift in the realm of artificial intelligence (AI) towards generative AI-based systems. In unison, from data communication and networking perspective, AI and machine learning (AI/ML) algorithms are envisioned to be pervasively incorporated into the future generations of wireless communications systems, highlighting the need for novel AI-native solutions for the emergent communication scenarios. In this article, we outline the applications of generative AI in wireless communication systems to lay the foundations for research in this field. Diffusion-based generative models, as the new state-of-the-art paradigm of generative models, are introduced, and their applications in wireless communication systems are discussed. Two case studies are also presented to showcase how diffusion models can be exploited for the development of resilient AI-native communication systems. Specifically, we propose denoising diffusion probabilistic models (DDPM) for a wireless communication scheme with non-ideal transceivers, where 30% improvement is achieved in terms of bit error rate. As the second application, DDPMs are employed at the transmitter to shape the constellation symbols, highlighting a robust out-of-distribution performance. Finally, future directions and open issues for the development of generative AI-based wireless systems are discussed to promote future research endeavors towards wireless generative AI (WiGenAI).
    OWAdapt: An adaptive loss function for deep learning using OWA operators. (arXiv:2305.19443v2 [cs.LG] UPDATED)
    In this paper, we propose a fuzzy adaptive loss function for enhancing deep learning performance in classification tasks. Specifically, we redefine the cross-entropy loss to effectively address class-level noise conditions, including the challenging problem of class imbalance. Our approach introduces aggregation operators, leveraging the power of fuzzy logic to improve classification accuracy. The rationale behind our proposed method lies in the iterative up-weighting of class-level components within the loss function, focusing on those with larger errors. To achieve this, we employ the ordered weighted average (OWA) operator and combine it with an adaptive scheme for gradient-based learning. Through extensive experimentation, our method outperforms other commonly used loss functions, such as the standard cross-entropy or focal loss, across various binary and multiclass classification tasks. Furthermore, we explore the influence of hyperparameters associated with the OWA operators and present a default configuration that performs well across different experimental settings.  ( 2 min )
    ImageNomer: description of a functional connectivity and omics analysis tool and case study identifying a race confound. (arXiv:2302.00767v2 [q-bio.PE] UPDATED)
    Most packages for the analysis of fMRI-based functional connectivity (FC) and genomic data are used with a programming language interface, lacking an easy-to-navigate GUI frontend. This exacerbates two problems found in these types of data: demographic confounds and quality control in the face of high dimensionality of features. The reason is that it is too slow and cumbersome to use a programming interface to create all the necessary visualizations required to identify all correlations, confounding effects, or quality control problems in a dataset. To remedy this situation, we have developed ImageNomer, a data visualization and analysis tool that allows inspection of both subject-level and cohort-level demographic, genomic, and imaging features. The software is Python-based, runs in a self-contained Docker image, and contains a browser-based GUI frontend. We demonstrate the usefulness of ImageNomer by identifying an unexpected race confound when predicting achievement scores in the Philadelphia Neurodevelopmental Cohort (PNC) dataset. In the past, many studies have attempted to use FC to identify achievement-related features in fMRI. Using ImageNomer, we find a clear potential for confounding effects of race. Using correlation analysis in the ImageNomer software, we show that FCs correlated with Wide Range Achievement Test (WRAT) score are in fact more highly correlated with race. Investigating further, we find that whereas both FC and SNP (genomic) features can account for 10-15\% of WRAT score variation, this predictive ability disappears when controlling for race. In this work, we demonstrate the advantage of our ImageNomer GUI tool in data exploration and confound detection. Additionally, this work identifies race as a strong confound in FC data and casts doubt on the possibility of finding unbiased achievement-related features in fMRI and SNP data of healthy adolescents.
    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. (arXiv:2310.04610v2 [cs.AI] UPDATED)
    In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.
    Conditional Sig-Wasserstein GANs for Time Series Generation. (arXiv:2006.05421v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have been extremely successful in generating samples, from seemingly high dimensional probability measures. However, these methods struggle to capture the temporal dependence of joint probability distributions induced by time-series data. Furthermore, long time-series data streams hugely increase the dimension of the target space, which may render generative modelling infeasible. To overcome these challenges, motivated by the autoregressive models in econometric, we are interested in the conditional distribution of future time series given the past information. We propose the generic conditional Sig-WGAN framework by integrating Wasserstein-GANs (WGANs) with mathematically principled and efficient path feature extraction called the signature of a path. The signature of a path is a graded sequence of statistics that provides a universal description for a stream of data, and its expected value characterises the law of the time-series model. In particular, we develop the conditional Sig-$W_1$ metric, that captures the conditional joint law of time series models, and use it as a discriminator. The signature feature space enables the explicit representation of the proposed discriminators which alleviates the need for expensive training. We validate our method on both synthetic and empirical dataset and observe that our method consistently and significantly outperforms state-of-the-art benchmarks with respect to measures of similarity and predictive ability.  ( 3 min )
    Smoothed $f$-Divergence Distributionally Robust Optimization. (arXiv:2306.14041v2 [math.OC] UPDATED)
    In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA and other DRO approaches: It attains a statistical bound on the out-of-sample performance, for a wide class of objective functions and distributions, that is nearly tightest in terms of exponential decay rate. This DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L\'evy-Prokhorov (LP) distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized versions using smoothed $f$-divergence, are not harder than DRO problems based on $f$-divergence or Wasserstein distances, rendering our DRO formulations both statistically optimal and computationally viable.  ( 2 min )
    Exploring the Relationship Between Model Architecture and In-Context Learning Ability. (arXiv:2310.08049v1 [cs.LG])
    What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps towards answering this question. In particular, we evaluate fifteen model architectures across a suite of synthetic in-context learning tasks. The selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, and emerging attention alternatives. We discover that all considered architectures can perform in-context learning under certain conditions. However, contemporary architectures are found to be the best performing, especially as task complexity grows. Additionally, our follow-up experiments delve into various factors that influence in-context learning. We observe varied sensitivities among architectures with respect to hyperparameter settings. Our study of training dynamics reveals that certain architectures exhibit a smooth, progressive learning trajectory, while others demonstrate periods of stagnation followed by abrupt mastery of the task. Finally, and somewhat surprisingly, we find that several emerging attention alternatives are more robust in-context learners than transformers; since such approaches have constant-sized memory footprints at inference time, this result opens the future possibility of scaling up in-context learning to vastly larger numbers of in-context examples.
    BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning. (arXiv:2308.04263v3 [cs.LG] UPDATED)
    This paper introduces BarlowRL, a data-efficient reinforcement learning agent that combines the Barlow Twins self-supervised learning framework with DER (Data-Efficient Rainbow) algorithm. BarlowRL outperforms both DER and its contrastive counterpart CURL on the Atari 100k benchmark. BarlowRL avoids dimensional collapse by enforcing information spread to the whole space. This helps RL algorithms to utilize uniformly spread state representation that eventually results in a remarkable performance. The integration of Barlow Twins with DER enhances data efficiency and achieves superior performance in the RL tasks. BarlowRL demonstrates the potential of incorporating self-supervised learning techniques to improve RL algorithms.
    Network Synthetic Interventions: A Causal Framework for Panel Data Under Network Interference. (arXiv:2210.11355v2 [econ.EM] UPDATED)
    We propose a generalization of the synthetic controls and synthetic interventions methodology to incorporate network interference. We consider the estimation of unit-specific potential outcomes from panel data in the presence of spillover across units and unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the factor models typically used in panel data settings. We propose an estimator, Network Synthetic Interventions (NSI), and show that it consistently estimates the mean outcomes for a unit under an arbitrary set of counterfactual treatments for the network. We further establish that the estimator is asymptotically normal. We furnish two validity tests for whether the NSI estimator reliably generalizes to produce accurate counterfactual estimates. We provide a novel graph-based experiment design that guarantees the NSI estimator produces accurate counterfactual estimates, and also analyze the sample complexity of the proposed design. We conclude with simulations that corroborate our theoretical findings.
    Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing. (arXiv:2210.15889v4 [cs.AI] UPDATED)
    Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years. As NeSy shows promise of reconciling the advantages of reasoning and interpretability of symbolic representation and robust learning in neural networks, it may serve as a catalyst for the next generation of AI. In the present paper, we provide a systematic overview of the recent developments and important contributions of NeSy research. Firstly, we introduce study history of this area, covering early work and foundations. We further discuss background concepts and identify key driving factors behind the development of NeSy. Afterward, we categorize recent landmark approaches along several main characteristics that underline this research paradigm, including neural-symbolic integration, knowledge representation, knowledge embedding, and functionality. Next, we briefly discuss the successful application of modern NeSy approaches in several domains. Then, we benchmark several NeSy methods on three representative application tasks. Finally, we identify the open problems together with potential future research directions. This survey is expected to help new researchers enter this rapidly evolving field and accelerate the progress towards data-and knowledge-driven AI.  ( 2 min )
    GePSAn: Generative Procedure Step Anticipation in Cooking Videos. (arXiv:2310.08312v1 [cs.CV])
    We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations in natural settings. This problem has been largely overlooked in previous work. To address this challenge, we frame future step prediction as modelling the distribution of all possible candidates for the next step. Specifically, we design a generative model that takes a series of video clips as input, and generates multiple plausible and diverse candidates (in natural language) for the next step. Following previous work, we side-step the video annotation scarcity by pretraining our model on a large text-based corpus of procedural activities, and then transfer the model to the video domain. Our experiments, both in textual and video domains, show that our model captures diversity in the next step prediction and generates multiple plausible future predictions. Moreover, our model establishes new state-of-the-art results on YouCookII, where it outperforms existing baselines on the next step anticipation. Finally, we also show that our model can successfully transfer from text to the video domain zero-shot, ie, without fine-tuning or adaptation, and produces good-quality future step predictions from video.
    GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning. (arXiv:2310.07365v2 [cs.LG] UPDATED)
    Graph-structured data is ubiquitous in the world which models complex relationships between objects, enabling various Web applications. Daily influxes of unlabeled graph data on the Web offer immense potential for these applications. Graph self-supervised algorithms have achieved significant success in acquiring generic knowledge from abundant unlabeled graph data. These pre-trained models can be applied to various downstream Web applications, saving training time and improving downstream (target) performance. However, different graphs, even across seemingly similar domains, can differ significantly in terms of attribute semantics, posing difficulties, if not infeasibility, for transferring the pre-trained models to downstream tasks. Concretely speaking, for example, the additional task-specific node information in downstream tasks (specificity) is usually deliberately omitted so that the pre-trained representation (transferability) can be leveraged. The trade-off as such is termed as "transferability-specificity dilemma" in this work. To address this challenge, we introduce an innovative deployment module coined as GraphControl, motivated by ControlNet, to realize better graph domain transfer learning. Specifically, by leveraging universal structural pre-trained models and GraphControl, we align the input space across various graphs and incorporate unique characteristics of target data as conditional inputs. These conditions will be progressively integrated into the model during fine-tuning or prompt tuning through ControlNet, facilitating personalized deployment. Extensive experiments show that our method significantly enhances the adaptability of pre-trained models on target attributed datasets, achieving 1.4-3x performance gain. Furthermore, it outperforms training-from-scratch methods on target data with a comparable margin and exhibits faster convergence.
    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability. (arXiv:2307.03135v3 [cs.CV] UPDATED)
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Poster: https://xuanlinli17.github.io/pdfs/iccv23_large_vlm_distillation_poster.pdf Code: https://github.com/xuanlinli17/large_vlm_distillation_ood  ( 2 min )
    Imitation Learning from Observation with Automatic Discount Scheduling. (arXiv:2310.07433v2 [cs.RO] UPDATED)
    Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.
    PromptTTS 2: Describing and Generating Voices with Text Prompt. (arXiv:2309.02285v2 [eess.AS] UPDATED)
    Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online.
    A Neural-preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions. (arXiv:2310.00177v3 [math.NA] UPDATED)
    We introduce a neural-preconditioned iterative solver for Poisson equations with mixed boundary conditions. The Poisson equation is ubiquitous in scientific computing: it governs a wide array of physical phenomena, arises as a subproblem in many numerical algorithms, and serves as a model problem for the broader class of elliptic PDEs. The most popular Poisson discretizations yield large sparse linear systems. At high resolution, and for performance-critical applications, iterative solvers can be advantageous for these -- but only when paired with powerful preconditioners. The core of our solver is a neural network trained to approximate the inverse of a discrete structured-grid Laplace operator for a domain of arbitrary shape and with mixed boundary conditions. The structure of this problem motivates a novel network architecture that we demonstrate is highly effective as a preconditioner even for boundary conditions outside the training set. We show that on challenging test cases arising from an incompressible fluid simulation, our method outperforms state-of-the-art solvers like algebraic multigrid as well as some recent neural preconditioners.
    TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting. (arXiv:2310.04948v2 [cs.LG] UPDATED)
    The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements. In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposition of the complex interaction between trend, seasonal and residual components; and (ii) introducing the selection-based prompts to facilitate distribution adaptation in non-stationary time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on a number of time series benchmark datasets. This performance gain is observed not only in standard supervised learning settings but also in scenarios involving previously unseen datasets as well as in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework.
    Efficient probabilistic reconciliation of forecasts for real-valued and count time series. (arXiv:2210.02286v3 [stat.ML] UPDATED)
    Hierarchical time series are common in several applied fields. The forecasts for these time series are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this paper, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a significant improvement over base probabilistic forecasts.  ( 2 min )
    Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts. (arXiv:2310.05898v2 [cs.LG] UPDATED)
    Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
    Clustering Three-Way Data with Outliers. (arXiv:2310.05288v2 [stat.ML] UPDATED)
    Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.
    Federated Generalization via Information-Theoretic Distribution Diversification. (arXiv:2310.07171v2 [cs.LG] UPDATED)
    Federated Learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the non-Independent Identically Distributed (non-IID) challenge, poses a significant hurdle to FL's generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models' generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the divergence between the training distributions of participating clients and the testing distributions of non-participating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations aim to bolster FL's generalization prowess by encompassing a more varied set of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.
    SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network. (arXiv:2310.06488v2 [cs.NE] UPDATED)
    Spiking neural networks (SNNs) have demonstrated the capability to achieve comparable performance to deep neural networks (DNNs) in both visual and linguistic domains while offering the advantages of improved energy efficiency and adherence to biological plausibility. However, the extension of such single-modality SNNs into the realm of multimodal scenarios remains an unexplored territory. Drawing inspiration from the concept of contrastive language-image pre-training (CLIP), we introduce a novel framework, named SpikeCLIP, to address the gap between two modalities within the context of spike-based computing through a two-step recipe involving ``Alignment Pre-training + Dual-Loss Fine-tuning". Extensive experiments demonstrate that SNNs achieve comparable results to their DNN counterparts while significantly reducing energy consumption across a variety of datasets commonly used for multimodal model evaluation. Furthermore, SpikeCLIP maintains robust performance in image classification tasks that involve class labels not predefined within specific categories.
    GP-net: Flexible Viewpoint Grasp Proposal. (arXiv:2209.10404v3 [cs.RO] UPDATED)
    We present the Grasp Proposal Network (GP-net), a Convolutional Neural Network model which can generate 6-DoF grasps from flexible viewpoints, e.g. as experienced by mobile manipulators. To train GP-net, we synthetically generate a dataset containing depth-images and ground-truth grasp information. In real-world experiments, we use the EGAD evaluation benchmark to evaluate GP-net against two commonly used algorithms, the Volumetric Grasping Network (VGN) and the Grasp Pose Detection package (GPD), on a PAL TIAGo mobile manipulator. In contrast to the state-of-the-art methods in robotic grasping, GP-net can be used for grasping objects from flexible, unknown viewpoints without the need to define the workspace and achieves a grasp success of 54.4% compared to 51.6% for VGN and 44.2% for GPD. We provide a ROS package along with our code and pre-trained models at https://aucoroboticsmu.github.io/GP-net/.
    FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning. (arXiv:2310.07807v1 [cs.LG])
    Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. Its goal is to create a robust and accurate model by aggregating and retraining local models over multiple rounds. However, FL faces challenges regarding data heterogeneity and model aggregation effectiveness. In order to simulate real-world data, researchers use methods for data partitioning that transform a dataset designated for centralized learning into a group of sub-datasets suitable for distributed machine learning with different data heterogeneity. In this paper, we study the currently popular data partitioning techniques and visualize their main disadvantages: the lack of precision in the data diversity, which leads to unreliable heterogeneity indexes, and the inability to incrementally challenge the FL algorithms. To resolve this problem, we propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions with gradual difficulty. We introduce a metric to measure data heterogeneity among the learning agents and a transformation technique that divides any dataset into splits with precise data diversity. Through a comparative study, we demonstrate the superiority of our method over existing FL data partitioning approaches, showcasing its potential to challenge model aggregation algorithms. Experimental results indicate that our approach gradually challenges the FL strategies, and the models trained on FedSym distributions are more distinct.
    GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models. (arXiv:2310.06225v2 [cs.AI] UPDATED)
    Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding across various domains, including healthcare and finance. For some tasks, LLMs achieve similar or better performance than trained human beings, therefore it is reasonable to employ human exams (e.g., certification tests) to assess the performance of LLMs. We present a comprehensive evaluation of popular LLMs, such as Llama 2 and GPT, on their ability to answer agriculture-related questions. In our evaluation, we also employ RAG (Retrieval-Augmented Generation) and ER (Ensemble Refinement) techniques, which combine information retrieval, generation capabilities, and prompting strategies to improve the LLMs' performance. To demonstrate the capabilities of LLMs, we selected agriculture exams and benchmark datasets from three of the largest agriculture producer countries: Brazil, India, and the USA. Our analysis highlights GPT-4's ability to achieve a passing score on exams to earn credits for renewing agronomist certifications, answering 93% of the questions correctly and outperforming earlier general-purpose models, which achieved 88% accuracy. On one of our experiments, GPT-4 obtained the highest performance when compared to human subjects. This performance suggests that GPT-4 could potentially pass on major graduate education admission tests or even earn credits for renewing agronomy certificates. We also explore the models' capacity to address general agriculture-related questions and generate crop management guidelines for Brazilian and Indian farmers, utilizing robust datasets from the Brazilian Agency of Agriculture (Embrapa) and graduate program exams from India. The results suggest that GPT-4, ER, and RAG can contribute meaningfully to agricultural education, assessment, and crop management practice, offering valuable insights to farmers and agricultural professionals.
    NECO: NEural Collapse Based Out-of-distribution detection. (arXiv:2310.06823v2 [stat.ML] UPDATED)
    Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. We plan to release the code after the anonymity period.
    Locality-Aware Generalizable Implicit Neural Representation. (arXiv:2310.05624v2 [cs.LG] UPDATED)
    Generalizable implicit neural representation (INR) enables a single continuous function, i.e., a coordinate-based neural network, to represent multiple data instances by modulating its weights or intermediate features using latent codes. However, the expressive power of the state-of-the-art modulation is limited due to its inability to localize and capture fine-grained details of data entities such as specific pixels and rays. To address this issue, we propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder. The transformer encoder predicts a set of latent tokens from a data instance to encode local information into each latent token. The locality-aware INR decoder extracts a modulation vector by selectively aggregating the latent tokens via cross-attention for a coordinate input and then predicts the output by progressively decoding with coarse-to-fine modulation through multiple frequency bandwidths. The selective token aggregation and the multi-band feature modulation enable us to learn locality-aware representation in spatial and spectral aspects, respectively. Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks such as image generation.
    Defending Our Privacy With Backdoors. (arXiv:2310.08320v1 [cs.LG])
    The proliferation of large AI models trained on uncurated, often sensitive web-scraped data has raised significant privacy concerns. One of the concerns is that adversaries can extract information about the training data using privacy attacks. Unfortunately, the task of removing specific information from the models without sacrificing performance is not straightforward and has proven to be challenging. We propose a rather easy yet effective defense based on backdoor attacks to remove private information such as names of individuals from models, and focus in this work on text encoders. Specifically, through strategic insertion of backdoors, we align the embeddings of sensitive phrases with those of neutral terms-"a person" instead of the person's name. Our empirical results demonstrate the effectiveness of our backdoor-based defense on CLIP by assessing its performance using a specialized privacy attack for zero-shot classifiers. Our approach provides not only a new "dual-use" perspective on backdoor attacks, but also presents a promising avenue to enhance the privacy of individuals within models trained on uncurated web-scraped data.
    On Regularized Sparse Logistic Regression. (arXiv:2309.05925v2 [cs.LG] UPDATED)
    Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost.
    Rethinking Negative Pairs in Code Search. (arXiv:2310.08069v1 [cs.SE])
    Recently, contrastive learning has become a key component in fine-tuning code search models for software development efficiency and effectiveness. It pulls together positive code snippets while pushing negative samples away given search queries. Among contrastive learning, InfoNCE is the most widely used loss function due to its better performance. However, the following problems in negative samples of InfoNCE may deteriorate its representation learning: 1) The existence of false negative samples in large code corpora due to duplications. 2). The failure to explicitly differentiate between the potential relevance of negative samples. As an example, a bubble sorting algorithm example is less ``negative'' than a file saving function for the quick sorting algorithm query. In this paper, we tackle the above problems by proposing a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE. Theoretically, we analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation. We furthermore discuss the superiority of proposed loss functions with other design alternatives. Extensive experiments demonstrate the effectiveness of Soft-InfoNCE and weights estimation methods under state-of-the-art code search models on a large-scale public dataset consisting of six programming languages. Source code is available at \url{https://github.com/Alex-HaochenLi/Soft-InfoNCE}.
    Learning Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning. (arXiv:2308.16198v2 [cs.LG] UPDATED)
    In modern communication systems, efficient and reliable information dissemination is crucial for supporting critical operations across domains like disaster response, autonomous vehicles, and sensor networks. This paper introduces a Multi-Agent Reinforcement Learning (MARL) approach as a significant step forward in achieving more decentralized, efficient, and collaborative solutions. We propose a Partially Observable Stochastic Game (POSG) formulation for information dissemination empowering each agent to decide on message forwarding independently, based on their one-hop neighborhood. This constitutes a significant paradigm shift from traditional heuristics based on Multi-Point Relay (MPR) selection. Our approach harnesses Graph Convolutional Reinforcement Learning, employing Graph Attention Networks (GAT) with dynamic attention to capture essential network features. We propose two approaches, L-DGN and HL-DGN, which differ in the information that is exchanged among agents. We evaluate the performance of our decentralized approaches, by comparing them with a widely-used MPR heuristic, and we show that our trained policies are able to efficiently cover the network while bypassing the MPR set selection process. Our approach is a first step toward supporting the resilience of real-world broadcast communication infrastructures via learned, collaborative information dissemination.
    Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction. (arXiv:2308.03807v2 [eess.IV] UPDATED)
    Proximal gradient-based optimization is one of the most common strategies to solve inverse problem of images, and it is easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to increased computational costs. In this work, we propose a deep geometric incremental learning framework based on the second Nesterov proximal gradient optimization. The proposed end-to-end network not only has the powerful learning ability for high-/low-frequency image features, but also can theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction. Furthermore, it can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains and achieve fast convergence. Our reconstruction framework is decomposed into four modules including general linear reconstruction, cascade geometric incremental restoration, Nesterov acceleration, and post-processing. In the image restoration step, a cascade geometric incremental learning module is designed to compensate for missing texture information from different geometric spectral decomposition domains. Inspired by the overlap-tile strategy, we also develop a post-processing module to remove the block effect in patch-wise-based natural image reconstruction. All parameters in the proposed model are learnable, an adaptive initialization technique of physical parameters is also employed to make model flexibility and ensure converging smoothly. We compare the reconstruction performance of the proposed method with existing state-of-the-art methods to demonstrate its superiority. Our source codes are available at https://github.com/fanxiaohong/Nest-DGIL.
    COVID-19 Detection Using Swin Transformer Approach from Computed Tomography Images. (arXiv:2310.08165v1 [eess.IV])
    The accurate and efficient diagnosis of COVID-19 is of paramount importance, particularly in the context of large-scale medical imaging datasets. In this preprint paper, we propose a novel approach for COVID-19 diagnosis using CT images that leverages the power of Swin Transformer models, state-of-the-art solutions in computer vision tasks. Our method includes a systematic approach for patient-level predictions, where individual CT slices are classified as COVID-19 or non-COVID, and the patient's overall diagnosis is determined through majority voting. The application of the Swin Transformer in this context results in patient-level predictions that demonstrate exceptional diagnostic accuracy. In terms of evaluation metrics, our approach consistently outperforms the baseline, as well as numerous competing methods, showcasing its effectiveness in COVID-19 diagnosis. The macro F1 score achieved by our model exceeds the baseline and offers a robust solution for accurate diagnosis.
    Learn From Model Beyond Fine-Tuning: A Survey. (arXiv:2310.08184v1 [cs.AI])
    Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the research community. The relevant papers we investigated in this article can be accessed at .
    Emergence of Latent Binary Encoding in Deep Neural Network Classifiers. (arXiv:2310.08224v1 [cs.LG])
    We observe the emergence of binary encoding within the latent space of deep-neural-network classifiers. Such binary encoding is induced by introducing a linear penultimate layer, which is equipped during training with a loss function that grows as $\exp(\vec{x}^2)$, where $\vec{x}$ are the coordinates in the latent space. The phenomenon we describe represents a specific instance of a well-documented occurrence known as \textit{neural collapse}, which arises in the terminal phase of training and entails the collapse of latent class means to the vertices of a simplex equiangular tight frame (ETF). We show that binary encoding accelerates convergence toward the simplex ETF and enhances classification accuracy.
    On Training Derivative-Constrained Neural Networks. (arXiv:2310.01649v2 [cs.LG] UPDATED)
    We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalization and label rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with denormalization and label rescaling better incorporate training signal provided by derivative constraints.
    Generalization bounds for neural ordinary differential equations and deep residual networks. (arXiv:2305.06648v2 [stat.ML] UPDATED)
    Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields in particular a generalization bound for a class of deep residual networks. The bound involves the magnitude of the difference between successive weight matrices. We illustrate numerically how this quantity affects the generalization capability of neural networks.  ( 2 min )
    Asynchronous Evolution of Deep Neural Network Architectures. (arXiv:2308.04102v2 [cs.NE] UPDATED)
    Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of up to $K$ individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as $M<<K$ individuals have been evaluated. A suitable value for $M$ is determined experimentally, balancing diversity and efficiency. To showcase the generality and power of AES, it was first evaluated in eight-line sorting network design (a single-population optimization task with limited evaluation-time variability), achieving an over two-fold speedup. Next, it was evaluated in 11-bit multiplexer design (a single-population discovery task with extended variability), where a 14-fold speedup was observed. It was then scaled up to ENAS for image captioning (a multi-population open-ended-optimization task), resulting in an over two-fold speedup. In all problems, a multifold performance improvement was observed, suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times, such as those in ENAS.
    Measuring Feature Sparsity in Language Models. (arXiv:2310.07837v1 [cs.LG])
    Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers.
    DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning. (arXiv:2309.05173v2 [cs.CL] UPDATED)
    Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20% memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.
    Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v2 [stat.ML] UPDATED)
    In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.
    A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness. (arXiv:2309.03004v2 [cs.LG] UPDATED)
    A recent empirical observation (Li et al., 2022b) of activation sparsity in MLP blocks offers an opportunity to drastically reduce computation costs for free. Although having attributed it to training dynamics, existing theoretical explanations of activation sparsity are restricted to shallow networks, small training steps and special training, despite its emergence in deep models standardly trained for a large number of steps. To fill these gaps, we propose the notion of gradient sparsity as one source of activation sparsity and a theoretical explanation based on it that sees sparsity a necessary step to adversarial robustness w.r.t. hidden features and parameters, which is approximately the flatness of minima for well-learned models. The theory applies to standardly trained LayerNorm-ed MLPs, and further to Transformers or other architectures trained with weight noises. Eliminating other sources of flatness except for sparsity, we discover the phenomenon that the ratio between the largest and smallest non-zero singular values of weight matrices is small. When discussing the emergence of this spectral concentration, we use random matrix theory (RMT) as a powerful tool to analyze stochastic gradient noises. Validational experiments are conducted to verify our gradient-sparsity-based explanation. We propose two plug-and-play modules for both training and finetuning for sparsity. Experiments on ImageNet-1k and C4 demonstrate their 50% sparsity improvements, indicating further potential cost reduction in both training and inference.
    Semantic-Forward Relaying: A Novel Framework Towards 6G Cooperative Communications. (arXiv:2310.07987v1 [cs.NI])
    This letter proposes a novel relaying framework, semantic-forward (SF), for cooperative communications towards the sixth-generation (6G) wireless networks. The SF relay extracts and transmits the semantic features, which reduces forwarding payload, and also improves the network robustness against intra-link errors. Based on the theoretical basis for cooperative communications with side information and the turbo principle, we design a joint source-channel coding algorithm to iteratively exchange the extrinsic information for enhancing the decoding gains at the destination. Surprisingly, simulation results indicate that even in bad channel conditions, SF relaying can still effectively improve the recovered information quality.
    On the Security Vulnerabilities of Text-to-SQL Models. (arXiv:2211.15363v3 [cs.CL] UPDATED)
    Although it has been demonstrated that Natural Language Processing (NLP) algorithms are vulnerable to deliberate attacks, the question of whether such weaknesses can lead to software security threats is under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL systems that are commonly used to create natural language interfaces to databases. We showed that the Text-to-SQL modules within six commercial applications can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service attacks. This is the first demonstration that NLP models can be exploited as attack vectors in the wild. In addition, experiments using four open-source language models verified that straightforward backdoor attacks on Text-to-SQL systems achieve a 100% success rate without affecting their performance. The aim of this work is to draw the community's attention to potential software security issues associated with NLP algorithms and encourage exploration of methods to mitigate against them.
    Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery. (arXiv:2305.14259v3 [cs.CL] UPDATED)
    Literature-Based Discovery (LBD) aims to discover new scientific knowledge by mining papers and generating hypotheses. Standard LBD is limited to predicting pairwise relations between discrete concepts (e.g., drug-disease links), and ignores critical contexts like experimental settings (e.g., a specific patient population where a drug is evaluated) and background motivations (e.g., to find drugs without specific side effects). We address these limitations with a novel formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in natural language, while grounding them in a context that controls the hypothesis search space. We present a modeling framework using retrieval of ``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our inspiration prompting approaches partially mitigate this issue. Our work represents a first step toward building language models that generate new ideas derived from scientific literature.  ( 2 min )
    Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning. (arXiv:2310.07996v1 [cs.LG])
    This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning. This mechanism -- the repeated resetting of weights in the last layer, which we nickname "zapping" -- was originally designed for a meta-continual-learning procedure, yet we show it is surprisingly applicable in many settings beyond both meta-learning and continual learning. In our experiments, we wish to transfer a pre-trained image classifier to a new set of classes, in a few shots. We show that our zapping procedure results in improved transfer accuracy and/or more rapid adaptation in both standard fine-tuning and continual learning settings, while being simple to implement and computationally efficient. In many cases, we achieve performance on par with state of the art meta-learning without needing the expensive higher-order gradients, by using a combination of zapping and sequential learning. An intuitive explanation for the effectiveness of this zapping procedure is that representations trained with repeated zapping learn features that are capable of rapidly adapting to newly initialized classifiers. Such an approach may be considered a computationally cheaper type of, or alternative to, meta-learning rapidly adaptable features with higher-order gradients. This adds to recent work on the usefulness of resetting neural network parameters during training, and invites further investigation of this mechanism.
    Finite Scalar Quantization: VQ-VAE Made Simple. (arXiv:2309.15505v2 [cs.CV] UPDATED)
    We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets. By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ. On top of such discrete representations, we can train the same models that have been trained on VQ-VAE representations. For example, autoregressive and masked transformer models for image generation, multimodal generation, and dense prediction computer vision tasks. Concretely, we employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation. Despite the much simpler design of FSQ, we obtain competitive performance in all these tasks. We emphasize that FSQ does not suffer from codebook collapse and does not need the complex machinery employed in VQ (commitment losses, codebook reseeding, code splitting, entropy penalties, etc.) to learn expressive discrete representations.
    Neural Diffusion Models. (arXiv:2310.08337v1 [cs.LG])
    Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
    Pure Monte Carlo Counterfactual Regret Minimization. (arXiv:2309.03084v2 [cs.AI] UPDATED)
    Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. However, we believe that there are two problems with CFR: First, matrix multiplication is required in CFR iteration, and the time complexity of one iteration is too high; Secondly, the game characteristics in the real world are different. Just using one CFR algorithm will not be perfectly suitable for all game problems. For these two problems, this paper proposes a new algorithm called Pure CFR (PCFR) based on CFR. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. This algorithm has three advantages. First, PCFR can be combined with any CFR variant. The resulting Pure MCCFR (PMCCFR) can significantly reduce the time and space complexity of one iteration. Secondly, our experiments show that the convergence speed of the PMCCFR is 2$\sim$3 times that of the MCCFR. Finally, there is a type of game that is very suitable for PCFR, we call this type of game clear-game, which is characterized by a high proportion of dominated strategies. Experiments show that in clear-game, the convergence rate of PMCCFR is two orders of magnitude higher than that of MCCFR.
    A Carbon Tracking Model for Federated Learning: Impact of Quantization and Sparsification. (arXiv:2310.08087v1 [eess.SP])
    Federated Learning (FL) methods adopt efficient communication technologies to distribute machine learning tasks across edge devices, reducing the overhead in terms of data storage and computational complexity compared to centralized solutions. Rather than moving large data volumes from producers (sensors, machines) to energy-hungry data centers, raising environmental concerns due to resource demands, FL provides an alternative solution to mitigate the energy demands of several learning tasks while enabling new Artificial Intelligence of Things (AIoT) applications. This paper proposes a framework for real-time monitoring of the energy and carbon footprint impacts of FL systems. The carbon tracking tool is evaluated for consensus (fully decentralized) and classical FL policies. For the first time, we present a quantitative evaluation of different computationally and communication efficient FL methods from the perspectives of energy consumption and carbon equivalent emissions, suggesting also general guidelines for energy-efficient design. Results indicate that consensus-driven FL implementations should be preferred for limiting carbon emissions when the energy efficiency of the communication is low (i.e., < 25 Kbit/Joule). Besides, quantization and sparsification operations are shown to strike a balance between learning performances and energy consumption, leading to sustainable FL designs.
    Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey. (arXiv:2310.07745v1 [cs.LG])
    The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber-defence, numerous challenges must be overcome before DRL can be applied to autonomous cyber-operations (ACO) at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACO problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACO-DRL agent. We provide: i.) A summary of the domain properties that define the ACO problem; ii.) A comprehensive evaluation of the extent to which domains used for benchmarking DRL approaches are comparable to ACO; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACO. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACO.
    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains. (arXiv:2310.07838v1 [cs.LG])
    We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{{|{\mathcal S}||{\mathcal A}|}/{n}}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory.
    Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. (arXiv:2307.03486v2 [cs.LG] UPDATED)
    Discovering achievements with a hierarchical structure in procedurally generated environments presents a significant challenge. This requires an agent to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods have been built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be advantageous for learning hierarchical dependencies. However, these methods demand an excessive number of environment interactions or large model sizes, limiting their practicality. In this work, we demonstrate that proximal policy optimization (PPO), a simple yet versatile model-free algorithm, outperforms previous methods when optimized with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, albeit with limited confidence. Based on this observation, we introduce a novel contrastive learning method, called achievement distillation, which strengthens the agent's ability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment in a sample-efficient manner while utilizing fewer model parameters.
    Conformal inference for regression on Riemannian Manifolds. (arXiv:2310.08209v1 [stat.ML])
    Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.
    Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets. (arXiv:2310.04413v2 [cs.LG] UPDATED)
    Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.
    Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability. (arXiv:2302.03770v2 [cs.LG] UPDATED)
    Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals. In particular, offline GCRL only requires purely pre-collected datasets to perform training tasks without additional interactions with the environment. Although offline GCRL has become increasingly prevalent and many previous works have demonstrated its empirical success, the theoretical understanding of efficient offline GCRL algorithms is not well established, especially when the state space is huge and the offline dataset only covers the policy we aim to learn. In this paper, we provide a rigorous theoretical analysis of an existing empirically successful offline GCRL algorithm. We prove that under slight modification, this algorithm enjoys an $\widetilde{O}(\text{poly}(1/\epsilon))$ sample complexity (where $\epsilon$ is the desired suboptimality of the learned policy) with general function approximation thanks to the property of (semi-)strong convexity of the objective functions. We only require nearly minimal assumptions on the dataset (single-policy concentrability) and the function class (realizability). Moreover, this algorithm consists of two uninterleaved optimization steps, which we refer to as $V$-learning and policy learning, and is computationally stable since it does not involve minimax optimization. We also empirically validate our theory by showing that the modified algorithm outperforms the previous algorithm in various real-world environments. To the best of our knowledge, this is the first algorithm that is both provably efficient with general function approximation and single-policy concentrability, and empirically successful without requiring solving minimax optimization problems.  ( 3 min )
    Analyzing And Editing Inner Mechanisms Of Backdoored Language Models. (arXiv:2302.12461v2 [cs.LG] UPDATED)
    Poisoning of data sets is a potential security threat to large language models that can lead to backdoored models. A description of the internal mechanisms of backdoored language models and how they process trigger inputs, e.g., when switching to toxic language, has yet to be found. In this work, we study the internal representations of transformer-based backdoored language models and determine early-layer MLP modules as most important for the backdoor mechanism in combination with the initial embedding projection. We use this knowledge to remove, insert, and modify backdoor mechanisms with engineered replacements that reduce the MLP module outputs to essentials for the backdoor mechanism. To this end, we introduce PCP ablation, where we replace transformer modules with low-rank matrices based on the principal components of their activations. We demonstrate our results on backdoored toy, backdoored large, and non-backdoored open-source models. We show that we can improve the backdoor robustness of large language models by locally constraining individual modules during fine-tuning on potentially poisonous data sets. Trigger warning: Offensive language.
    Quantum-Enhanced Forecasting: Leveraging Quantum Gramian Angular Field and CNNs for Stock Return Predictions. (arXiv:2310.07427v2 [cs.LG] UPDATED)
    We propose a time series forecasting method named Quantum Gramian Angular Field (QGAF). This approach merges the advantages of quantum computing technology with deep learning, aiming to enhance the precision of time series classification and forecasting. We successfully transformed stock return time series data into two-dimensional images suitable for Convolutional Neural Network (CNN) training by designing specific quantum circuits. Distinct from the classical Gramian Angular Field (GAF) approach, QGAF's uniqueness lies in eliminating the need for data normalization and inverse cosine calculations, simplifying the transformation process from time series data to two-dimensional images. To validate the effectiveness of this method, we conducted experiments on datasets from three major stock markets: the China A-share market, the Hong Kong stock market, and the US stock market. Experimental results revealed that compared to the classical GAF method, the QGAF approach significantly improved time series prediction accuracy, reducing prediction errors by an average of 25% for Mean Absolute Error (MAE) and 48% for Mean Squared Error (MSE). This research confirms the potential and promising prospects of integrating quantum computing with deep learning techniques in financial time series forecasting.
    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning. (arXiv:2310.07918v1 [cs.LG])
    Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.
    NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining. (arXiv:2310.07402v2 [cs.LG] UPDATED)
    Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical scales to high-dimensional vectors, we propose a numerically multi-scaled embedding module enumerating all possible scales for the scalar values. The model undergoes pretraining using the proposed numerically multi-scaled embedding with a simple contrastive objective on a large-scale dataset containing over a million sequences. We study its transfer performance on a number of univariate and multivariate classification benchmarks. Our method exhibits remarkable improvement against previous representation learning approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods.
    MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition. (arXiv:2210.09222v2 [cs.CV] UPDATED)
    Multimodal sensors provide complementary information to develop accurate machine-learning methods for human activity recognition (HAR), but introduce significantly higher computational load, which reduces efficiency. This paper proposes an efficient multimodal neural architecture for HAR using an RGB camera and inertial measurement units (IMUs) called Multimodal Temporal Segment Attention Network (MMTSA). MMTSA first transforms IMU sensor data into a temporal and structure-preserving gray-scale image using the Gramian Angular Field (GAF), representing the inherent properties of human activities. MMTSA then applies a multimodal sparse sampling method to reduce data redundancy. Lastly, MMTSA adopts an inter-segment attention module for efficient multimodal fusion. Using three well-established public datasets, we evaluated MMTSA's effectiveness and efficiency in HAR. Results show that our method achieves superior performance improvements 11.13% of cross-subject F1-score on the MMAct dataset than the previous state-of-the-art (SOTA) methods. The ablation study and analysis suggest that MMTSA's effectiveness in fusing multimodal data for accurate HAR. The efficiency evaluation on an edge device showed that MMTSA achieved significantly better accuracy, lower computational load, and lower inference latency than SOTA methods.  ( 2 min )
    A Comprehensive Review on Tree Detection Methods Using Point Cloud and Aerial Imagery from Unmanned Aerial Vehicles. (arXiv:2309.16375v2 [cs.CV] CROSS LISTED)
    Unmanned Aerial Vehicles (UAVs) are considered cutting-edge technology with highly cost-effective and flexible usage scenarios. Although many papers have reviewed the application of UAVs in agriculture, the review of the application for tree detection is still insufficient. This paper focuses on tree detection methods applied to UAV data collected by UAVs. There are two kinds of data, the point cloud and the images, which are acquired by the Light Detection and Ranging (LiDAR) sensor and camera, respectively. Among the detection methods using point-cloud data, this paper mainly classifies these methods according to LiDAR and Digital Aerial Photography (DAP). For the detection methods using images directly, this paper reviews these methods by whether or not to use the Deep Learning (DL) method. Our review concludes and analyses the comparison and combination between the application of LiDAR-based and DAP-based point cloud data. The performance, relative merits, and application fields of the methods are also introduced. Meanwhile, this review counts the number of tree detection studies using different methods in recent years. From our statics, the detection task using DL methods on the image has become a mainstream trend as the number of DL-based detection researches increases to 45% of the total number of tree detection studies up to 2022. As a result, this review could help and guide researchers who want to carry out tree detection on specific forests and for farmers to use UAVs in managing agriculture production.
    Core-sets for Fair and Diverse Data Summarization. (arXiv:2310.08122v1 [cs.DS])
    We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural diversity measures: sum-of-pairwise distances and sum-of-nearest-neighbor distances, and show improved core-set construction algorithms with respect to these measures. More precisely, we show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. Second, we show the first core-set w.r.t. the sum-of-nearest-neighbor distances. Finally, we run several experiments showing the effectiveness of our core-set approach. In particular, we apply constrained diversity maximization to summarize a set of timed messages that takes into account the messages' recency. Specifically, the summary should include more recent messages compared to older ones. This is a real task in one of the largest communication platforms, affecting the experience of hundreds of millions daily active users. By utilizing our core-set method for this task, we achieve a 100x speed-up while losing the diversity by only a few percent. Moreover, our approach allows us to improve the space usage of the algorithm in the streaming setting.
    MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning. (arXiv:2310.08252v1 [cs.LG])
    Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox.
    Generative modeling of time-dependent densities via optimal transport and projection pursuit. (arXiv:2304.09663v2 [stat.ML] UPDATED)
    Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.
    An interpretable neural network-based non-proportional odds model for ordinal regression. (arXiv:2303.17823v3 [stat.ME] UPDATED)
    This study proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) N$^3$POM is designed to directly handle continuous responses, whereas standard methods typically treat de facto ordered continuous variables as discrete, (2) instead of estimating response-dependent finite coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, N$^3$POM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply N$^3$POM to several real-world datasets.
    Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts. (arXiv:2302.02931v2 [cs.LG] UPDATED)
    Training machine learning models robust to distribution shifts is critical for real-world applications. Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points. Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative, since they naively upweight high loss points which may form a contrived set that does not correspond to any meaningful group in the real world (e.g., when the high loss points are randomly mislabeled training points). In this work, we address limitations in prior approaches by assuming a more nuanced form of group shift: conditioned on the label, we assume that the true group function (indicator over group) is simple. For example, we may expect that group shifts occur along low bitrate features (e.g., image background, lighting). Thus, we aim to learn a model that maintains high accuracy on simple group functions realized by these low bitrate features, that need not spend valuable model capacity achieving high accuracy on contrived groups of examples. Based on this, we consider the two-player game formulation of DRO where the adversary's capacity is bitrate-constrained. Our resulting practical algorithm, Bitrate-Constrained DRO (BR-DRO), does not require group information on training samples yet matches the performance of Group DRO on datasets that have training group annotations and that of CVaR DRO on long-tailed distributions. Our theoretical analysis reveals that in some settings BR-DRO objective can provably yield statistically efficient and less conservative solutions than unconstrained CVaR DRO.
    Explainable Attention for Few-shot Learning and Beyond. (arXiv:2310.07800v1 [cs.AI])
    Attention mechanisms have exhibited promising potential in enhancing learning models by identifying salient portions of input data. This is particularly valuable in scenarios where limited training samples are accessible due to challenges in data collection and labeling. Drawing inspiration from human recognition processes, we posit that an AI baseline's performance could be more accurate and dependable if it is exposed to essential segments of raw data rather than the entire input dataset, akin to human perception. However, the task of selecting these informative data segments, referred to as hard attention finding, presents a formidable challenge. In situations with few training samples, existing studies struggle to locate such informative regions due to the large number of training parameters that cannot be effectively learned from the available limited samples. In this study, we introduce a novel and practical framework for achieving explainable hard attention finding, specifically tailored for few-shot learning scenarios, called FewXAT. Our approach employs deep reinforcement learning to implement the concept of hard attention, directly impacting raw input data and thus rendering the process interpretable for human understanding. Through extensive experimentation across various benchmark datasets, we demonstrate the efficacy of our proposed method.
    Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach. (arXiv:2310.07981v1 [cs.LG])
    A flow control system is a critical concept for increasing the production capacity of manufacturing systems. To solve the scheduling optimization problem related to the flow control with the aim of improving productivity, existing methods depend on a heuristic design by domain human experts. Therefore, the methods require correction, monitoring, and verification by using real equipment. As system designs increase in complexity, the monitoring time increases, which decreases the probability of arriving at the optimal design. As an alternative approach to the heuristic design of flow control systems, the use of deep reinforcement learning to solve the scheduling optimization problem has been considered. Although the existing research on reinforcement learning has yielded excellent performance in some areas, the applicability of the results to actual FAB such as display and semiconductor manufacturing processes is not evident so far. To this end, we propose a method to implement a physical simulation environment and devise a feasible flow control system design using a transfer robot in display manufacturing through reinforcement learning. We present a model and parameter setting to build a virtual environment for different display transfer robots, and training methods of reinforcement learning on the environment to obtain an optimal scheduling of glass flow control systems. Its feasibility was verified by using different types of robots used in the actual process.
    Identifying latent distances with Finslerian geometry. (arXiv:2212.10010v2 [cs.LG] UPDATED)
    Riemannian geometry provides us with powerful tools to explore the latent space of generative models while preserving the underlying structure of the data. The latent space can be equipped it with a Riemannian metric, pulled back from the data manifold. With this metric, we can systematically navigate the space relying on geodesics defined as the shortest curves between two points. Generative models are often stochastic, causing the data space, the Riemannian metric, and the geodesics, to be stochastic as well. Stochastic objects are at best impractical, and at worst impossible, to manipulate. A common solution is to approximate the stochastic pullback metric by its expectation. But the geodesics derived from this expected Riemannian metric do not correspond to the expected length-minimising curves. In this work, we propose another metric whose geodesics explicitly minimise the expected length of the pullback metric. We show this metric defines a Finsler metric, and we compare it with the expected Riemannian metric. In high dimensions, we prove that both metrics converge to each other at a rate of $O\left(\frac{1}{D}\right)$. This convergence implies that the established expected Riemannian metric is an accurate approximation of the theoretically more grounded Finsler metric. This provides justification for using the expected Riemannian metric for practical implementations.
    Theoretical Hardness and Tractability of POMDPs in RL with Partial Online State Information. (arXiv:2306.08762v2 [cs.LG] UPDATED)
    Partially observable Markov decision processes (POMDPs) have been widely applied to capture many real-world applications. However, existing theoretical results have shown that learning in general POMDPs could be intractable, where the main challenge lies in the lack of latent state information. A key fundamental question here is how much online state information (OSI) is sufficient to achieve tractability. In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full OSI, we need an exponentially scaling sample complexity to obtain an $\epsilon$-optimal policy solution for POMDPs. Nonetheless, inspired by the key insights in our lower bound design, we find that there exist important tractable classes of POMDPs even with only partial OSI. In particular, for two novel classes of POMDPs with partial OSI, we provide new algorithms that are proved to be near-optimal by establishing new regret upper and lower bounds.
    Infinite Width Graph Neural Networks for Node Regression/ Classification. (arXiv:2310.08176v1 [cs.LG])
    This work analyzes Graph Neural Networks, a generalization of Fully-Connected Deep Neural Nets on Graph structured data, when their width, that is the number of nodes in each fullyconnected layer is increasing to infinity. Infinite Width Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels, both Machine Learning Frameworks with long traditions and extensive theoretical foundations. Gaussian Processes and Kernels have much less hyperparameters then Neural Networks and can be used for uncertainty estimation, making them more user friendly for applications. This works extends the increasing amount of research connecting Gaussian Processes and Kernels to Neural Networks. The Kernel and Gaussian Process closed forms are derived for a variety of architectures, namely the standard Graph Neural Network, the Graph Neural Network with Skip-Concatenate Connections and the Graph Attention Neural Network. All architectures are evaluated on a variety of datasets on the task of transductive Node Regression and Classification. Additionally, a Spectral Sparsification method known as Effective Resistance is used to improve runtime and memory requirements. Extending the setting to inductive graph learning tasks (Graph Regression/ Classification) is straightforward and is briefly discussed in 3.5.
    Tight Time-Space Lower Bounds for Constant-Pass Learning. (arXiv:2310.08070v1 [cs.LG])
    In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning problems. Until recently, all these results considered learning in the streaming model, where each sample is drawn independently, and the learner is allowed a single pass over the stream of samples. Garg, Raz, and Tal [CCC'19] considered a stronger model, allowing multiple passes over the stream. In the $2$-pass model, they showed that learning parities of size $n$ requires either a memory of size $n^{1.5}$ or at least $2^{\sqrt{n}}$ samples. (Their result also generalizes to other learning problems.) In this work, for any constant $q$, we prove tight memory-sample lower bounds for any parity learning algorithm that makes $q$ passes over the stream of samples. We show that such a learner requires either $\Omega(n^{2})$ memory size or at least $2^{\Omega(n)}$ samples. Beyond establishing a tight lower bound, this is the first non-trivial lower bound for $q$-pass learning for any $q\ge 3$. Similar to prior work, our results extend to any learning problem with many nearly-orthogonal concepts. We complement the lower bound with an upper bound, showing that parity learning with $q$ passes can be done efficiently with $O(n^2/\log q)$ memory.
    AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE. (arXiv:2310.08012v1 [cs.LG])
    Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by $1.32\times$ to $1.8\times$ compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by $103\times$ and 3.46%, respectively, compared to CNNs under TFHE.
    Impact of multi-armed bandit strategies on deep recurrent reinforcement learning. (arXiv:2310.08331v1 [stat.ML])
    Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.
    Generative Intrinsic Optimization: Intrisic Control with Model Learning. (arXiv:2310.08100v1 [cs.LG])
    Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.
    Efficient Integrators for Diffusion Generative Models. (arXiv:2310.07894v1 [cs.LG])
    Diffusion models suffer from slow sample generation at inference time. Therefore, developing a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, mapping the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.
    The Thousand Faces of Explainable AI Along the Machine Learning Life Cycle: Industrial Reality and Current State of Research. (arXiv:2310.07882v1 [cs.LG])
    In this paper, we investigate the practical relevance of explainable artificial intelligence (XAI) with a special focus on the producing industries and relate them to the current state of academic XAI research. Our findings are based on an extensive series of interviews regarding the role and applicability of XAI along the Machine Learning (ML) lifecycle in current industrial practice and its expected relevance in the future. The interviews were conducted among a great variety of roles and key stakeholders from different industry sectors. On top of that, we outline the state of XAI research by providing a concise review of the relevant literature. This enables us to provide an encompassing overview covering the opinions of the surveyed persons as well as the current state of academic research. By comparing our interview results with the current research approaches we reveal several discrepancies. While a multitude of different XAI approaches exists, most of them are centered around the model evaluation phase and data scientists. Their versatile capabilities for other stages are currently either not sufficiently explored or not popular among practitioners. In line with existing work, our findings also confirm that more efforts are needed to enable also non-expert users' interpretation and understanding of opaque AI models with existing methods and frameworks.
    Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control. (arXiv:2309.04370v2 [cs.RO] UPDATED)
    Seeing-eye robots are very useful tools for guiding visually impaired people, potentially producing a huge societal impact given the low availability and high cost of real guide dogs. Although a few seeing-eye robot systems have already been demonstrated, none considered external tugs from humans, which frequently occur in a real guide dog setting. In this paper, we simultaneously train a locomotion controller that is robust to external tugging forces via Reinforcement Learning (RL), and an external force estimator via supervised learning. The controller ensures stable walking, and the force estimator enables the robot to respond to the external forces from the human. These forces are used to guide the robot to the global goal, which is unknown to the robot, while the robot guides the human around nearby obstacles via a local planner. Experimental results in simulation and on hardware show that our controller is robust to external forces, and our seeing-eye system can accurately detect force direction. We demonstrate our full seeing-eye robot system on a real quadruped robot with a blindfolded human. The video can be seen at our project page: https://bu-air-lab.github.io/guide_dog/
    LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs. (arXiv:2308.08469v3 [cs.LG] UPDATED)
    In this work, we leverage pre-trained Large Language Models (LLMs) to enhance time-series forecasting. Mirroring the growing interest in unifying models for Natural Language Processing and Computer Vision, we envision creating an analogous model for long-term time-series forecasting. Due to limited large-scale time-series data for building robust foundation models, our approach LLM4TS focuses on leveraging the strengths of pre-trained LLMs. By combining time-series patching with temporal encoding, we have enhanced the capability of LLMs to handle time-series data effectively. Inspired by the supervised fine-tuning in chatbot domains, we prioritize a two-stage fine-tuning process: first conducting supervised fine-tuning to orient the LLM towards time-series data, followed by task-specific downstream fine-tuning. Furthermore, to unlock the flexibility of pre-trained LLMs without extensive parameter adjustments, we adopt several Parameter-Efficient Fine-Tuning (PEFT) techniques. Drawing on these innovations, LLM4TS has yielded state-of-the-art results in long-term forecasting. Our model has also shown exceptional capabilities as both a robust representation learner and an effective few-shot learner, thanks to the knowledge transferred from the pre-trained LLM.
    Multi-Objective Optimization for Sparse Deep Neural Network Training. (arXiv:2308.12243v2 [cs.LG] UPDATED)
    Different conflicting optimization criteria arise naturally in various Deep Learning scenarios. These can address different main tasks (i.e., in the setting of Multi-Task Learning), but also main and secondary tasks such as loss minimization versus sparsity. The usual approach is a simple weighting of the criteria, which formally only works in the convex setting. In this paper, we present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs) with respect to several tasks. By employing this scalarization technique, the algorithm can identify all optimal solutions of the original problem while reducing its complexity to a sequence of single-objective problems. The simplified problems are then solved using an Augmented Lagrangian method, enabling the use of popular optimization techniques such as Adam and Stochastic Gradient Descent, while efficaciously handling constraints. Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with a particular focus on Deep Multi-Task models, which are typically designed with a very large number of weights to perform equally well on multiple tasks. Through experiments conducted on two Machine Learning datasets, we demonstrate the possibility of adaptively sparsifying the model during training without significantly impacting its performance, if we are willing to apply task-specific adaptations to the network weights. Code is available at https://github.com/salomonhotegni/MDMTN.
    Continual Learning via Manifold Expansion Replay. (arXiv:2310.08038v1 [cs.LG])
    In continual learning, the learner learns multiple tasks in sequence, with data being acquired only once for each task. Catastrophic forgetting is a major challenge to continual learning. To reduce forgetting, some existing rehearsal-based methods use episodic memory to replay samples of previous tasks. However, in the process of knowledge integration when learning a new task, this strategy also suffers from catastrophic forgetting due to an imbalance between old and new knowledge. To address this problem, we propose a novel replay strategy called Manifold Expansion Replay (MaER). We argue that expanding the implicit manifold of the knowledge representation in the episodic memory helps to improve the robustness and expressiveness of the model. To this end, we propose a greedy strategy to keep increasing the diameter of the implicit manifold represented by the knowledge in the buffer during memory management. In addition, we introduce Wasserstein distance instead of cross entropy as distillation loss to preserve previous knowledge. With extensive experimental validation on MNIST, CIFAR10, CIFAR100, and TinyImageNet, we show that the proposed method significantly improves the accuracy in continual learning setup, outperforming the state of the arts.
    MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback. (arXiv:2309.10691v2 [cs.CL] UPDATED)
    To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation protocols often emphasize benchmark performance with single-turn exchanges, neglecting the nuanced interactions among the user, LLMs, and external tools, while also underestimating the importance of natural language feedback from users. These oversights contribute to discrepancies between research benchmark evaluations and real-world use cases. We introduce MINT, a benchmark that evaluates LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. To ensure reproducibility, we provide an evaluation framework where LLMs can access tools by executing Python code and receive users' natural language feedback simulated by GPT-4. We repurpose a diverse set of established evaluation datasets focusing on reasoning, coding, and decision-making and carefully curate them into a compact subset for efficient evaluation. Our analysis of 20 open- and closed-source LLMs offers intriguing findings. (a) LLMs generally benefit from tools and language feedback, with performance gains (absolute, same below) of 1-8% for each turn of tool use and 2-17% with natural language feedback. (b) Better single-turn performance does not guarantee better multi-turn performance. (c) Surprisingly, on the LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities. We expect MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation can be less accessible compared to commercial LLMs with a larger user base.
    Explorable Mesh Deformation Subspaces from Unstructured Generative Models. (arXiv:2310.07814v1 [cs.GR])
    Exploring variations of 3D shapes is a time-consuming process in traditional 3D modeling tools. Deep generative models of 3D shapes often feature continuous latent spaces that can, in principle, be used to explore potential variations starting from a set of input shapes. In practice, doing so can be problematic: latent spaces are high dimensional and hard to visualize, contain shapes that are not relevant to the input shapes, and linear paths through them often lead to sub-optimal shape transitions. Furthermore, one would ideally be able to explore variations in the original high-quality meshes used to train the generative model, not its lower-quality output geometry. In this paper, we present a method to explore variations among a given set of landmark shapes by constructing a mapping from an easily-navigable 2D exploration space to a subspace of a pre-trained generative model. We first describe how to find a mapping that spans the set of input landmark shapes and exhibits smooth variations between them. We then show how to turn the variations in this subspace into deformation fields, to transfer those variations to high-quality meshes for the landmark shapes. Our results show that our method can produce visually-pleasing and easily-navigable 2D exploration spaces for several different shape categories, especially as compared to prior work on learning deformation spaces for 3D shapes.
    Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples. (arXiv:2310.07747v1 [cs.LG])
    Learning transparent, interpretable controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. ABC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability. We assess ABC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability. Keywords: Interpretable Reinforcement Learning, Explainable Reinforcement Learning, Reinforcement Learning Transparency, Offline Reinforcement Learning, Batched Control.
    A Complete Recipe for Diffusion Generative Models. (arXiv:2303.01748v2 [cs.LG] UPDATED)
    Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.
    Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders. (arXiv:2310.08164v1 [cs.LG])
    Large language models (LLMs) aligned to human preferences via reinforcement learning from human feedback (RLHF) underpin many commercial applications. However, how RLHF impacts LLM internals remains opaque. We propose a novel method to interpret learned reward functions in RLHF-tuned LLMs using sparse autoencoders. Our approach trains autoencoder sets on activations from a base LLM and its RLHF-tuned version. By comparing autoencoder hidden spaces, we identify unique features that reflect the accuracy of the learned reward model. To quantify this, we construct a scenario where the tuned LLM learns token-reward mappings to maximize reward. This is the first application of sparse autoencoders for interpreting learned rewards and broadly inspecting reward learning in LLMs. Our method provides an abstract approximation of reward integrity. This presents a promising technique for ensuring alignment between specified objectives and model behaviors.
    Score Regularized Policy Optimization through Diffusion Behavior. (arXiv:2310.07297v2 [cs.LG] UPDATED)
    Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow because it necessitates tens to hundreds of iterative inference steps for one action. To address this issue, we propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models, leveraging the latter to directly regularize the policy gradient with the behavior distribution's score function during optimization. Our method enjoys powerful generative capabilities of diffusion modeling while completely circumventing the computationally intensive and time-consuming diffusion sampling scheme, both during training and evaluation. Extensive results on D4RL tasks show that our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks, while still maintaining state-of-the-art performance.
    Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling. (arXiv:2303.09033v2 [cs.LG] UPDATED)
    Most bandit algorithms assume that the reward variances or their upper bounds are known, and that they are the same for all arms. This naturally leads to suboptimal performance and higher regret due to variance overestimation. On the other hand, underestimated reward variances may lead to linear regret due to committing early to a suboptimal arm. This motivated prior works on variance-adaptive frequentist algorithms, which have strong instance-dependent regret bounds but cannot incorporate prior knowledge on reward variances. We lay foundations for the Bayesian setting, which incorporates prior knowledge. This results in lower regret in practice, due to using the prior in the algorithm design, and also improved regret guarantees. Specifically, we study Gaussian bandits with {unknown heterogeneous reward variances}, and develop a Thompson sampling algorithm with prior-dependent Bayes regret bounds. We achieve lower regret with lower reward variances and more informative priors on them, which is precisely why we pay only for what is uncertain. This is the first result of its kind. Finally, we corroborate our theory with extensive experiments, which show the superiority of our variance-adaptive Bayesian algorithm over prior frequentist approaches. We also show that our approach is robust to model misspecification and can be applied with estimated priors.
    Extreme Image Transformations Facilitate Robust Latent Object Representations. (arXiv:2310.07725v1 [cs.LG])
    Adversarial attacks can affect the object recognition capabilities of machines in wild. These can often result from spurious correlations between input and class labels, and are prone to memorization in large networks. While networks are expected to do automated feature selection, it is not effective at the scale of the object. Humans, however, are able to select the minimum set of features required to form a robust representation of an object. In this work, we show that finetuning any pretrained off-the-shelf network with Extreme Image Transformations (EIT) not only helps in learning a robust latent representation, it also improves the performance of these networks against common adversarial attacks of various intensities. Our EIT trained networks show strong activations in the object regions even when tested with more intense noise, showing promising generalizations across different kinds of adversarial attacks.
    Physics Constrained Unsupervised Deep Learning for Rapid, High Resolution Scanning Coherent Diffraction Reconstruction. (arXiv:2306.11014v2 [physics.comp-ph] UPDATED)
    By circumventing the resolution limitations of optics, coherent diffractive imaging (CDI) and ptychography are making their way into scientific fields ranging from X-ray imaging to astronomy. Yet, the need for time consuming iterative phase recovery hampers real-time imaging. While supervised deep learning strategies have increased reconstruction speed, they sacrifice image quality. Furthermore, these methods' demand for extensive labeled training data is experimentally burdensome. Here, we propose an unsupervised physics-informed neural network reconstruction method, PtychoPINN, that retains the factor of 100-to-1000 speedup of deep learning-based reconstruction while improving reconstruction quality by combining the diffraction forward map with real-space constraints from overlapping measurements. In particular, PtychoPINN significantly advances generalizability, accuracy (with a typical 10 dB PSNR increase), and linear resolution (2- to 6-fold gain). This blend of performance and speed offers exciting prospects for high-resolution real-time imaging in high-throughput environments such as X-ray free electron lasers (XFELs) and diffraction-limited light sources.
    In-Context Unlearning: Language Models as Few Shot Unlearners. (arXiv:2310.07579v2 [cs.LG] UPDATED)
    Machine unlearning, the study of efficiently removing the impact of specific training points on the trained model, has garnered increased attention of late, driven by the need to comply with privacy regulations like the Right to be Forgotten. Although unlearning is particularly relevant for LLMs in light of the copyright issues they raise, achieving precise unlearning is computationally infeasible for very large models. To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or when the LLM is accessed via API. In this work, we propose a new class of unlearning methods for LLMs we call ''In-Context Unlearning'', providing inputs in context and without having to update model parameters. To unlearn a particular training instance, we provide the instance alongside a flipped label and additional correctly labelled instances which are prepended as inputs to the LLM at inference time. Our experimental results demonstrate that these contexts effectively remove specific information from the training set while maintaining performance levels that are competitive with (or in some cases exceed) state-of-the-art unlearning methods that require access to the LLM parameters.
    Lifelong Audio-video Masked Autoencoder with Forget-robust Localized Alignments. (arXiv:2310.08204v1 [cs.CV])
    We present a lifelong audio-video masked autoencoder that continually learns the multimodal representations from a video stream containing audio-video pairs, while its distribution continually shifts over time. Specifically, we propose two novel ideas to tackle the problem: (1) Localized Alignment: We introduce a small trainable multimodal encoder that predicts the audio and video tokens that are well-aligned with each other. This allows the model to learn only the highly correlated audiovisual patches with accurate multimodal relationships. (2) Forget-robust multimodal patch selection: We compare the relative importance of each audio-video patch between the current and past data pair to mitigate unintended drift of the previously learned audio-video representations. Our proposed method, FLAVA (Forget-robust Localized Audio-Video Alignment), therefore, captures the complex relationships between the audio and video modalities during training on a sequence of pre-training tasks while alleviating the forgetting of learned audiovisual correlations. Our experiments validate that FLAVA outperforms the state-of-the-art continual learning methods on several benchmark datasets under continual audio-video representation learning scenarios.
    Impact of Co-occurrence on Factual Knowledge of Large Language Models. (arXiv:2310.08256v1 [cs.CL])
    Large language models (LLMs) often make factually incorrect responses despite their success in various applications. In this paper, we hypothesize that relying heavily on simple co-occurrence statistics of the pre-training corpora is one of the main factors that cause factual errors. Our results reveal that LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words over the correct answer. Consequently, LLMs struggle to recall facts whose subject and object rarely co-occur in the pre-training dataset although they are seen during finetuning. We show that co-occurrence bias remains despite scaling up model sizes or finetuning. Therefore, we suggest finetuning on a debiased dataset to mitigate the bias by filtering out biased samples whose subject-object co-occurrence count is high. Although debiased finetuning allows LLMs to memorize rare facts in the training set, it is not effective in recalling rare facts unseen during finetuning. Further research in mitigation will help build reliable language models by preventing potential errors. The code is available at \url{https://github.com/CheongWoong/impact_of_cooccurrence}.
    Observatory: Characterizing Embeddings of Relational Tables. (arXiv:2310.07736v1 [cs.DB])
    Language models and specialized table embedding models have recently demonstrated strong performance on many tasks over tabular data. Researchers and practitioners are keen to leverage these models in many new application contexts; but limited understanding of the strengths and weaknesses of these models, and the table representations they generate, makes the process of finding a suitable model for a given task reliant on trial and error. There is an urgent need to gain a comprehensive understanding of these models to minimize inefficiency and failures in downstream usage. To address this need, we propose Observatory, a formal framework to systematically analyze embedding representations of relational tables. Motivated both by invariants of the relational data model and by statistical considerations regarding data distributions, we define eight primitive properties, and corresponding measures to quantitatively characterize table embeddings for these properties. Based on these properties, we define an extensible framework to evaluate language and table embedding models. We collect and synthesize a suite of datasets and use Observatory to analyze seven such models. Our analysis provides insights into the strengths and weaknesses of learned representations over tables. We find, for example, that some models are sensitive to table structure such as column order, that functional dependencies are rarely reflected in embeddings, and that specialized table embedding models have relatively lower sample fidelity. Such insights help researchers and practitioners better anticipate model behaviors and select appropriate models for their downstream tasks, while guiding researchers in the development of new models.
    Understanding Sparse Feature Updates in Deep Networks using Iterative Linearisation. (arXiv:2211.12345v4 [cs.LG] UPDATED)
    Larger and deeper networks generalise well despite their increased capacity to overfit. Understanding why this happens is theoretically and practically important. One recent approach looks at the infinitely wide limits of such networks and their corresponding kernels. However, these theoretical tools cannot fully explain finite networks as the empirical kernel changes significantly during gradient-descent-based training in contrast to infinite networks. In this work, we derive an iterative linearised training method as a novel empirical tool to further investigate this distinction, allowing us to control for sparse (i.e. infrequent) feature updates and quantify the frequency of feature learning needed to achieve comparable performance. We justify iterative linearisation as an interpolation between a finite analog of the infinite width regime, which does not learn features, and standard gradient descent training, which does. Informally, we also show that it is analogous to a damped version of the Gauss-Newton algorithm -- a second-order method. We show that in a variety of cases, iterative linearised training surprisingly performs on par with standard training, noting in particular how much less frequent feature learning is required to achieve comparable performance. We also show that feature learning is essential for good performance. Since such feature learning inevitably causes changes in the NTK kernel, we provide direct negative evidence for the NTK theory, which states the NTK kernel remains constant during training.
    Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach. (arXiv:2310.08088v1 [cs.LG])
    In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.  ( 2 min )
    Invisible Threats: Backdoor Attack in OCR Systems. (arXiv:2310.08259v1 [cs.CR])
    Optical Character Recognition (OCR) is a widely used tool to extract text from scanned documents. Today, the state-of-the-art is achieved by exploiting deep neural networks. However, the cost of this performance is paid at the price of system vulnerability. For instance, in backdoor attacks, attackers compromise the training phase by inserting a backdoor in the victim's model that will be activated at testing time by specific patterns while leaving the overall model performance intact. This work proposes a backdoor attack for OCR resulting in the injection of non-readable characters from malicious input images. This simple but effective attack exposes the state-of-the-art OCR weakness, making the extracted text correct to human eyes but simultaneously unusable for the NLP application that uses OCR as a preprocessing step. Experimental results show that the attacked models successfully output non-readable characters for around 90% of the poisoned instances without harming their performance for the remaining instances.
    Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction. (arXiv:2107.14432v4 [cs.LG] UPDATED)
    We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/dlrover/blob/master/tfplus.  ( 3 min )
    GIO: Gradient Information Optimization for Training Dataset Selection. (arXiv:2306.11670v2 [cs.LG] UPDATED)
    It is often advantageous to train models on a subset of the available train examples, because the examples are of variable quality or because one would like to train with fewer examples, without sacrificing performance. We present Gradient Information Optimization (GIO), a scalable, task-agnostic approach to this data selection problem that requires only a small set of (unlabeled) examples representing a target distribution. GIO begins from a natural, information-theoretic objective that is intractable in practice. Our contribution is in showing that it can be made highly scalable through a simple relaxation of the objective and a highly efficient implementation. In experiments with machine translation, spelling correction, and image recognition, we show that GIO delivers outstanding results with very small train sets. These findings are robust to different representation models and hyperparameters for GIO itself. GIO is task- and domain-agnostic and can be applied out-of-the-box to new datasets and domains.
    A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks. (arXiv:2310.07891v1 [stat.ML])
    Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.
    QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models. (arXiv:2310.08041v1 [cs.CL])
    Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks.  ( 3 min )
    TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion. (arXiv:2310.08217v1 [cs.AI])
    Continual learning (CL) has remained a persistent challenge for deep neural networks due to catastrophic forgetting (CF) of previously learned tasks. Several techniques such as weight regularization, experience rehearsal, and parameter isolation have been proposed to alleviate CF. Despite their relative success, these research directions have predominantly remained orthogonal and suffer from several shortcomings, while missing out on the advantages of competing strategies. On the contrary, the brain continually learns, accommodates, and transfers knowledge across tasks by simultaneously leveraging several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, rarely resulting in CF. Inspired by how the brain exploits multiple mechanisms concurrently, we propose TriRE, a novel CL paradigm that encompasses retaining the most prominent neurons for each task, revising and solidifying the extracted knowledge of current and past tasks, and actively promoting less active neurons for subsequent tasks through rewinding and relearning. Across CL settings, TriRE significantly reduces task interference and surpasses different CL approaches considered in isolation.  ( 2 min )
    Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization. (arXiv:2310.08177v1 [cs.LG])
    Evaluating the adversarial robustness of machine learning models using gradient-based attacks is challenging. In this work, we show that hyperparameter optimization can improve fast minimum-norm attacks by automating the selection of the loss function, the optimizer and the step-size scheduler, along with the corresponding hyperparameters. Our extensive evaluation involving several robust models demonstrates the improved efficacy of fast minimum-norm attacks when hyper-up with hyperparameter optimization. We release our open-source code at https://github.com/pralab/HO-FMN.  ( 2 min )
    Data-Centric Learning from Unlabeled Graphs with Diffusion Model. (arXiv:2303.10108v2 [cs.LG] UPDATED)
    Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike the self-supervised learning.  ( 2 min )
    L2P: Learning to Place for Estimating Heavy-Tailed Distributed Outcomes. (arXiv:1908.04628v3 [cs.LG] UPDATED)
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.
    ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking. (arXiv:2310.08061v1 [q-bio.BM])
    Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.  ( 2 min )
    Lag-Llama: Towards Foundation Models for Time Series Forecasting. (arXiv:2310.08278v1 [cs.LG])
    Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.
    Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models. (arXiv:2310.08039v1 [cs.IR])
    Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.
    LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios. (arXiv:2310.08348v1 [cs.LG])
    Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero.  ( 2 min )
    Why Train More? Effective and Efficient Membership Inference via Memorization. (arXiv:2310.08015v1 [cs.LG])
    Membership Inference Attacks (MIAs) aim to identify specific data samples within the private training dataset of machine learning models, leading to serious privacy violations and other sophisticated threats. Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models. By doing so, the adversary obtains models trained "with" or "without" samples drawn from the distribution, and analyzes the characteristics of the samples under consideration. The adversary is often required to train more than hundreds of shadow models to extract the signals needed for MIAs; this becomes the computational overhead of MIAs. In this paper, we propose that by strategically choosing the samples, MI adversaries can maximize their attack success while minimizing the number of shadow models. First, our motivational experiments suggest memorization as the key property explaining disparate sample vulnerability to MIAs. We formalize this through a theoretical bound that connects MI advantage with memorization. Second, we show sample complexity bounds that connect the number of shadow models needed for MIAs with memorization. Lastly, we confirm our theoretical arguments with comprehensive experiments; by utilizing samples with high memorization scores, the adversary can (a) significantly improve its efficacy regardless of the MIA used, and (b) reduce the number of shadow models by nearly two orders of magnitude compared to state-of-the-art approaches.  ( 2 min )
    NeRF2: Neural Radio-Frequency Radiance Fields. (arXiv:2305.06118v2 [cs.NI] UPDATED)
    Although Maxwell discovered the physical laws of electromagnetic waves 160 years ago, how to precisely model the propagation of an RF signal in an electrically large and complex environment remains a long-standing problem. The difficulty is in the complex interactions between the RF signal and the obstacles (e.g., reflection, diffraction, etc.). Inspired by the great success of using a neural network to describe the optical field in computer vision, we propose a neural radio-frequency radiance field, NeRF$^\textbf{2}$, which represents a continuous volumetric scene function that makes sense of an RF signal's propagation. Particularly, after training with a few signal measurements, NeRF$^\textbf{2}$ can tell how/what signal is received at any position when it knows the position of a transmitter. As a physical-layer neural network, NeRF$^\textbf{2}$ can take advantage of the learned statistic model plus the physical model of ray tracing to generate a synthetic dataset that meets the training demands of application-layer artificial neural networks (ANNs). Thus, we can boost the performance of ANNs by the proposed turbo-learning, which mixes the true and synthetic datasets to intensify the training. Our experiment results show that turbo-learning can enhance performance with an approximate 50% increase. We also demonstrate the power of NeRF$^\textbf{2}$ in the field of indoor localization and 5G MIMO.
    Quasi-Arithmetic Mixtures, Divergence Minimization, and Bregman Information. (arXiv:2209.07481v2 [cs.LG] UPDATED)
    Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior work has constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. We provide a comprehensive analysis of this 'centroid' property using Bregman divergences under a monotonic embedding of the density function, thereby associating common divergences such as Amari's and Renyi's ${\alpha}$-divergences, ${(\alpha,\beta)}$-divergences, and the Jensen-Shannon divergence with intermediate densities along an annealing path. Our analysis highlights the interplay between parametric families, quasi-arithmetic means, and divergence functions using the rho-tau Bregman divergence framework of Zhang 2004,2013.
    MemSAC: Memory Augmented Sample Consistency for Large Scale Unsupervised Domain Adaptation. (arXiv:2207.12389v2 [cs.CV] UPDATED)
    Practical real world datasets with plentiful categories introduce new challenges for unsupervised domain adaptation like small inter-class discriminability, that existing approaches relying on domain invariance alone cannot handle sufficiently well. In this work we propose MemSAC, which exploits sample level similarity across source and target domains to achieve discriminative transfer, along with architectures that scale to a large number of categories. For this purpose, we first introduce a memory augmented approach to efficiently extract pairwise similarity relations between labeled source and unlabeled target domain instances, suited to handle an arbitrary number of classes. Next, we propose and theoretically justify a novel variant of the contrastive loss to promote local consistency among within-class cross domain samples while enforcing separation between classes, thus preserving discriminative transfer from source to target. We validate the advantages of MemSAC with significant improvements over previous state-of-the-art on multiple challenging transfer tasks designed for large-scale adaptation, such as DomainNet with 345 classes and fine-grained adaptation on Caltech-UCSD birds dataset with 200 classes. We also provide in-depth analysis and insights into the effectiveness of MemSAC.
    Learning Joint Latent Space EBM Prior Model for Multi-layer Generator. (arXiv:2306.06323v2 [cs.CV] UPDATED)
    This paper studies the fundamental problem of learning multi-layer generator models. The multi-layer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming non-informative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energy-based model (EBM) on the joint latent space over all layers of latent variables with the multi-layer generator as its backbone. Such joint latent space EBM prior model captures the intra-layer contextual relations at each layer through layer-wise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating high-quality images and capturing hierarchical features for better outlier detection.  ( 2 min )
    Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs. (arXiv:2310.07895v1 [cs.LG])
    This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices  ( 2 min )
    Participatory Personalization in Classification. (arXiv:2302.03874v2 [cs.LG] UPDATED)
    Machine learning models are often personalized with information that is protected, sensitive, self-reported, or costly to acquire. These models use information about people but do not facilitate nor inform their consent. Individuals cannot opt out of reporting personal information to a model, nor tell if they benefit from personalization in the first place. We introduce a family of classification models, called participatory systems, that let individuals opt into personalization at prediction time. We present a model-agnostic algorithm to learn participatory systems for personalization with categorical group attributes. We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks, benchmarking them with common approaches for personalization and imputation. Our results demonstrate that participatory systems can facilitate and inform consent while improving performance and data use across all groups who report personal data.
    Efficient Hyperdimensional Computing. (arXiv:2301.10902v2 [cs.LG] UPDATED)
    Hyperdimensional computing (HDC) is a method to perform classification that uses binary vectors with high dimensions and the majority rule. This approach has the potential to be energy-efficient and hence deemed suitable for resource-limited platforms due to its simplicity and massive parallelism. However, in order to achieve high accuracy, HDC sometimes uses hypervectors with tens of thousands of dimensions. This potentially negates its efficiency advantage. In this paper, we examine the necessity of such high dimensions and conduct a detailed theoretical analysis of the relationship between hypervector dimensions and accuracy. Our results demonstrate that as the dimension of the hypervectors increases, the worst-case/average-case HDC prediction accuracy with the majority rule decreases. Building on this insight, we develop HDC models that use binary hypervectors with dimensions orders of magnitude lower than those of state-of-the-art HDC models while maintaining equivalent or even improved accuracy and efficiency. For instance, on the MNIST dataset, we achieve 91.12% HDC accuracy in image classification with a dimension of only 64. Our methods perform operations that are only 0.35% of other HDC models with dimensions of 10,000. Furthermore, we evaluate our methods on ISOLET, UCI-HAR, and Fashion-MNIST datasets and investigate the limits of HDC computing.  ( 2 min )
    Does Synthetic Data Make Large Language Models More Efficient?. (arXiv:2310.07830v1 [cs.CL])
    Natural Language Processing (NLP) has undergone transformative changes with the advent of deep learning methodologies. One challenge persistently confronting researchers is the scarcity of high-quality, annotated datasets that drive these models. This paper explores the nuances of synthetic data generation in NLP, with a focal point on template-based question generation. By assessing its advantages, including data augmentation potential and the introduction of structured variety, we juxtapose these benefits against inherent limitations, such as the risk of overfitting and the constraints posed by pre-defined templates. Drawing from empirical evaluations, we demonstrate the impact of template-based synthetic data on the performance of modern transformer models. We conclude by emphasizing the delicate balance required between synthetic and real-world data, and the future trajectories of integrating synthetic data in model training pipelines. The findings aim to guide NLP practitioners in harnessing synthetic data's potential, ensuring optimal model performance in diverse applications.  ( 2 min )
    Spiral-Elliptical automated galaxy morphology classification from telescope images. (arXiv:2310.07740v1 [astro-ph.IM])
    The classification of galaxy morphologies is an important step in the investigation of theories of hierarchical structure formation. While human expert visual classification remains quite effective and accurate, it cannot keep up with the massive influx of data from emerging sky surveys. A variety of approaches have been proposed to classify large numbers of galaxies; these approaches include crowdsourced visual classification, and automated and computational methods, such as machine learning methods based on designed morphology statistics and deep learning. In this work, we develop two novel galaxy morphology statistics, descent average and descent variance, which can be efficiently extracted from telescope galaxy images. We further propose simplified versions of the existing image statistics concentration, asymmetry, and clumpiness, which have been widely used in the literature of galaxy morphologies. We utilize the galaxy image data from the Sloan Digital Sky Survey to demonstrate the effective performance of our proposed image statistics at accurately detecting spiral and elliptical galaxies when used as features of a random forest classifier.  ( 2 min )
    Joint Metrics Matter: A Better Standard for Trajectory Forecasting. (arXiv:2305.06292v2 [cs.RO] UPDATED)
    Multi-modal trajectory forecasting methods commonly evaluate using single-agent metrics (marginal metrics), such as minimum Average Displacement Error (ADE) and Final Displacement Error (FDE), which fail to capture joint performance of multiple interacting agents. Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group. Consequently, methods optimized for marginal metrics lead to overly-optimistic estimations of performance, which is detrimental to progress in trajectory forecasting research. In response to the limitations of marginal metrics, we present the first comprehensive evaluation of state-of-the-art (SOTA) trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate. We demonstrate the importance of joint metrics as opposed to marginal metrics with quantitative evidence and qualitative examples drawn from the ETH / UCY and Stanford Drone datasets. We introduce a new loss function incorporating joint metrics that, when applied to a SOTA trajectory forecasting method, achieves a 7\% improvement in JADE / JFDE on the ETH / UCY datasets with respect to the previous SOTA. Our results also indicate that optimizing for joint metrics naturally leads to an improvement in interaction modeling, as evidenced by a 16\% decrease in mean collision rate on the ETH / UCY datasets with respect to the previous SOTA. Code is available at \texttt{\hyperlink{https://github.com/ericaweng/joint-metrics-matter}{github.com/ericaweng/joint-metrics-matter}}.  ( 3 min )
    DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks. (arXiv:2310.07881v1 [cs.NI])
    Content Delivery Networks carry the majority of Internet traffic, and the increasing demand for video content as a major IP traffic across the Internet highlights the importance of caching and prefetching optimization algorithms. Prefetching aims to make data available in the cache before the requester places its request to reduce access time and improve the Quality of Experience on the user side. Prefetching is well investigated in operating systems, compiler instructions, in-memory cache, local storage systems, high-speed networks, and cloud systems. Traditional prefetching techniques are well adapted to a particular access pattern, but fail to adapt to sudden variations or randomization in workloads. This paper explores the use of reinforcement learning to tackle the changes in user access patterns and automatically adapt over time. To this end, we propose, DeePref, a Deep Reinforcement Learning agent for online video content prefetching in Content Delivery Networks. DeePref is a prefetcher implemented on edge networks and is agnostic to hardware design, operating systems, and applications. Our results show that DeePref DRQN, using a real-world dataset, achieves a 17% increase in prefetching accuracy and a 28% increase in prefetching coverage on average compared to baseline approaches that use video content popularity as a building block to statically or dynamically make prefetching decisions. We also study the possibility of transfer learning of statistical models from one edge network into another, where unseen user requests from unknown distribution are observed. In terms of transfer learning, the increase in prefetching accuracy and prefetching coverage are [$30%$, $10%$], respectively. Our source code will be available on Github.  ( 3 min )
    Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling. (arXiv:2310.07786v1 [cs.LG])
    Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritization for information of enduring value, or are designed in ways that do not scale in modern applications with high-dimensional user-specific features and large action set, or both. In this paper, we introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism that strategically prioritizes collecting information with the most lasting value in a non-stationary environment. Through empirical evaluations on two real-world recommendation datasets, which exhibit pronounced non-stationarity, we demonstrate that our approach significantly outperforms the state-of-the-art baselines.  ( 2 min )
    Multi-Scale Spatial-Temporal Recurrent Networks for Traffic Flow Prediction. (arXiv:2310.08138v1 [cs.LG])
    Traffic flow prediction is one of the most fundamental tasks of intelligent transportation systems. The complex and dynamic spatial-temporal dependencies make the traffic flow prediction quite challenging. Although existing spatial-temporal graph neural networks hold prominent, they often encounter challenges such as (1) ignoring the fixed graph that limits the predictive performance of the model, (2) insufficiently capturing complex spatial-temporal dependencies simultaneously, and (3) lacking attention to spatial-temporal information at different time lengths. In this paper, we propose a Multi-Scale Spatial-Temporal Recurrent Network for traffic flow prediction, namely MSSTRN, which consists of two different recurrent neural networks: the single-step gate recurrent unit and the multi-step gate recurrent unit to fully capture the complex spatial-temporal information in the traffic data under different time windows. Moreover, we propose a spatial-temporal synchronous attention mechanism that integrates adaptive position graph convolutions into the self-attention mechanism to achieve synchronous capture of spatial-temporal dependencies. We conducted extensive experiments on four real traffic datasets and demonstrated that our model achieves the best prediction accuracy with non-trivial margins compared to all the twenty baseline methods.  ( 2 min )
    Robust 1-bit Compressed Sensing with Iterative Hard Thresholding. (arXiv:2310.08019v1 [cs.IT])
    In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.  ( 3 min )
    Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization. (arXiv:2305.19838v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is typically used to optimize an unknown function $f$ that is noisy and costly to evaluate, by exploiting an acquisition function that must be maximized at each optimization step. Even if provably asymptotically optimal BO algorithms are efficient at optimizing low-dimensional functions, scaling them to high-dimensional spaces remains an open problem, often tackled by assuming an additive structure for $f$. By doing so, BO algorithms typically introduce additional restrictive assumptions on the additive structure that reduce their applicability domain. This paper contains two main contributions: (i) we relax the restrictive assumptions on the additive structure of $f$, at the expense of weakening the maximization guarantees of the acquisition function, and (ii) we address the over-exploration problem for decentralized BO algorithms. To these ends, we propose DumBO, an asymptotically optimal decentralized BO algorithm that achieves very competitive performance against state-of-the-art BO algorithms, especially when the additive structure of $f$ comprises high-dimensional factors.  ( 2 min )
    Counterfactual Explanations for Time Series Forecasting. (arXiv:2310.08137v1 [cs.LG])
    Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. ForecastCF guides the perturbations by applying constraints to the forecasted values to obtain desired prediction outcomes. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. Our results show that ForecastCF outperforms the baseline in terms of counterfactual validity and data manifold closeness. Overall, our findings suggest that ForecastCF can generate meaningful and relevant counterfactual explanations for various forecasting tasks.  ( 2 min )
    Language Models As Semantic Indexers. (arXiv:2310.07815v1 [cs.IR])
    Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs. Previous studies typically adopt a two-stage pipeline to learn semantic IDs by first procuring embeddings using off-the-shelf text encoders and then deriving IDs based on the embeddings. However, each step introduces potential information loss and there is usually an inherent mismatch between the distribution of embeddings within the latent space produced by text encoders and the anticipated distribution required for semantic indexing. Nevertheless, it is non-trivial to design a method that can learn the document's semantic representations and its hierarchical structure simultaneously, given that semantic IDs are discrete and sequentially structured, and the semantic supervision is deficient. In this paper, we introduce LMINDEXER, a self-supervised framework to learn semantic IDs with a generative language model. We tackle the challenge of sequential discrete ID by introducing a semantic indexer capable of generating neural sequential discrete representations with progressive training and contrastive learning. In response to the semantic supervision deficiency, we propose to train the model with a self-supervised document reconstruction objective. The learned semantic indexer can facilitate various downstream tasks, such as recommendation and retrieval. We conduct experiments on three tasks including recommendation, product search, and document retrieval on five datasets from various domains, where LMINDEXER outperforms competitive baselines significantly and consistently.
    Federated Learning from Small Datasets. (arXiv:2110.03469v3 [cs.LG] UPDATED)
    Federated learning allows multiple parties to collaboratively train a joint model without sharing local data. This enables applications of machine learning in settings of inherently distributed, undisclosable data such as in the medical domain. In practice, joint training is usually achieved by aggregating local models, for which local training objectives have to be in expectation similar to the joint (global) objective. Often, however, local datasets are so small that local objectives differ greatly from the global objective, resulting in federated learning to fail. We propose a novel approach that intertwines model aggregations with permutations of local models. The permutations expose each local model to a daisy chain of local datasets resulting in more efficient training in data-sparse domains. This enables training on extremely small local datasets, such as patient data across hospitals, while retaining the training efficiency and privacy benefits of federated learning.
    Interpretable Diffusion via Information Decomposition. (arXiv:2310.07972v1 [cs.LG])
    Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by diffusion models by noticing a precise relationship between diffusion and information decomposition. Exact expressions for mutual information and conditional mutual information can be written in terms of the denoising model. Furthermore, pointwise estimates can be easily estimated as well, allowing us to ask questions about the relationships between specific images and captions. Decomposing information even further to understand which variables in a high-dimensional space carry information is a long-standing problem. For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image. We exploit these new relations to measure the compositional understanding of diffusion models, to do unsupervised localization of objects in images, and to measure effects when selectively editing images through prompt interventions.
    ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets. (arXiv:2310.08096v1 [cs.LG])
    Public and private actors struggle to assess the vast amounts of information about sustainability commitments made by various institutions. To address this problem, we create a novel tool for automatically detecting corporate, national, and regional net zero and reduction targets in three steps. First, we introduce an expert-annotated data set with 3.5K text samples. Second, we train and release ClimateBERT-NetZero, a natural language classifier to detect whether a text contains a net zero or reduction target. Third, we showcase its analysis potential with two use cases: We first demonstrate how ClimateBERT-NetZero can be combined with conventional question-answering (Q&A) models to analyze the ambitions displayed in net zero and reduction targets. Furthermore, we employ the ClimateBERT-NetZero model on quarterly earning call transcripts and outline how communication patterns evolve over time. Our experiments demonstrate promising pathways for extracting and analyzing net zero and emission reduction targets at scale.
    ZEST: Attention-based Zero-Shot Learning for Unseen IoT Device Classification. (arXiv:2310.08036v1 [cs.NI])
    Recent research works have proposed machine learning models for classifying IoT devices connected to a network. However, there is still a practical challenge of not having all devices (and hence their traffic) available during the training of a model. This essentially means, during the operational phase, we need to classify new devices not seen during the training phase. To address this challenge, we propose ZEST -- a ZSL (zero-shot learning) framework based on self-attention for classifying both seen and unseen devices. ZEST consists of i) a self-attention based network feature extractor, termed SANE, for extracting latent space representations of IoT traffic, ii) a generative model that trains a decoder using latent features to generate pseudo data, and iii) a supervised model that is trained on the generated pseudo data for classifying devices. We carry out extensive experiments on real IoT traffic data; our experiments demonstrate i) ZEST achieves significant improvement (in terms of accuracy) over the baselines; ii) ZEST is able to better extract meaningful representations than LSTM which has been commonly used for modeling network traffic.  ( 2 min )
    CrIBo: Self-Supervised Learning via Cross-Image Object-Level Bootstrapping. (arXiv:2310.07855v1 [cs.CV])
    Leveraging nearest neighbor retrieval for self-supervised representation learning has proven beneficial with object-centric images. However, this approach faces limitations when applied to scene-centric datasets, where multiple objects within an image are only implicitly captured in the global representation. Such global bootstrapping can lead to undesirable entanglement of object representations. Furthermore, even object-centric datasets stand to benefit from a finer-grained bootstrapping approach. In response to these challenges, we introduce a novel Cross-Image Object-Level Bootstrapping method tailored to enhance dense visual representation learning. By employing object-level nearest neighbor bootstrapping throughout the training, CrIBo emerges as a notably strong and adequate candidate for in-context learning, leveraging nearest neighbor retrieval at test time. CrIBo shows state-of-the-art performance on the latter task while being highly competitive in more standard downstream segmentation tasks. Our code and pretrained models will be publicly available upon acceptance.  ( 2 min )
    CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity. (arXiv:2310.07969v1 [cs.CV])
    A major obstacle when attempting to train a machine learning system to evaluate facial clefts is the scarcity of large datasets of high-quality, ethics board-approved patient images. In response, we have built a deep learning-based cleft lip generator designed to produce an almost unlimited number of artificial images exhibiting high-fidelity facsimiles of cleft lip with wide variation. We undertook a transfer learning protocol testing different versions of StyleGAN-ADA (a generative adversarial network image generator incorporating adaptive data augmentation (ADA)) as the base model. Training images depicting a variety of cleft deformities were pre-processed to adjust for rotation, scaling, color adjustment and background blurring. The ADA modification of the primary algorithm permitted construction of our new generative model while requiring input of a relatively small number of training images. Adversarial training was carried out using 514 unique frontal photographs of cleft-affected faces to adapt a pre-trained model based on 70,000 normal faces. The Frechet Inception Distance (FID) was used to measure the similarity of the newly generated facial images to the cleft training dataset, while Perceptual Path Length (PPL) and the novel Divergence Index of Severity Histograms (DISH) measures were also used to assess the performance of the image generator that we dub CleftGAN. We found that StyleGAN3 with translation invariance (StyleGAN3-t) performed optimally as a base model. Generated images achieved a low FID reflecting a close similarity to our training input dataset of genuine cleft images. Low PPL and DISH measures reflected a smooth and semantically valid interpolation of images through the transfer learning process and a similar distribution of severity in the training and generated images, respectively.  ( 3 min )
    Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics. (arXiv:2310.08198v1 [cs.LG])
    Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to estimate the system dynamics. However, although it is possible to obtain useful models with the traditional approach, the process is time consuming and expensive because of the need to sweep many different current-profile configurations. In the present work, a novel DoE approach is developed based on deep reinforcement learning, which alters the configuration of the experiments on the fly based on the statistics of past experiments. Instead of sticking to a library of predefined current profiles, the proposed approach modifies the current profiles dynamically by updating the output space covered by past measurements, hence only the current profiles that are informative for future experiments are applied. Simulations and real experiments are used to show that the proposed approach gives models that are as accurate as those obtained with traditional DoE but by using 85\% less resources.  ( 2 min )
    Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines. (arXiv:2310.07940v1 [cs.LG])
    Researchers have long touted a vision of the future enabled by a proliferation of internet-of-things devices, including smart sensors, homes, and cities. Increasingly, embedding intelligence in such devices involves the use of deep neural networks. However, their storage and processing requirements make them prohibitive for cheap, off-the-shelf platforms. Overcoming those requirements is necessary for enabling widely-applicable smart devices. While many ways of making models smaller and more efficient have been developed, there is a lack of understanding of which ones are best suited for particular scenarios. More importantly for edge platforms, those choices cannot be analyzed in isolation from cost and user experience. In this work, we holistically explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors. We perform this hardware/software co-design from the cost, latency, and user-experience perspective, and develop a set of guidelines for optimal system design and model deployment for the most cost-constrained platforms. We demonstrate our approach using an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board.  ( 2 min )
    CHIP: Contrastive Hierarchical Image Pretraining. (arXiv:2310.08304v1 [cs.CV])
    Few-shot object classification is the task of classifying objects in an image with limited number of examples as supervision. We propose a one-shot/few-shot classification model that can classify an object of any unseen class into a relatively general category in an hierarchically based classification. Our model uses a three-level hierarchical contrastive loss based ResNet152 classifier for classifying an object based on its features extracted from Image embedding, not used during the training phase. For our experimentation, we have used a subset of the ImageNet (ILSVRC-12) dataset that contains only the animal classes for training our model and created our own dataset of unseen classes for evaluating our trained model. Our model provides satisfactory results in classifying the unknown objects into a generic category which has been later discussed in greater detail.  ( 2 min )
    On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism. (arXiv:2310.07852v1 [stat.ML])
    We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.  ( 2 min )
    The Expresssive Power of Transformers with Chain of Thought. (arXiv:2310.07923v1 [cs.LG])
    Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers' reasoning can be improved by allowing them to use a "chain of thought" or "scratchpad", i.e., generate and condition on a sequence of intermediate tokens before answering. Motivated by this, we ask: Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer? We show that the answer is yes, but the amount of increase depends crucially on the amount of intermediate generation. For instance, we find that transformer decoders with a logarithmic number of decoding steps (w.r.t. the input length) push the limits of standard transformers only slightly, while a linear number of decoding steps adds a clear new ability (under standard complexity conjectures): recognizing all regular languages. Our results also imply that linear steps keep transformer decoders within context-sensitive languages, and polynomial steps make them recognize exactly the class of polynomial-time solvable problems -- the first exact characterization of a type of transformers in terms of standard complexity classes. Together, our results provide a nuanced framework for understanding how the length of a transformer's chain of thought or scratchpad impacts its reasoning power.  ( 2 min )
    Emulating the dynamics of complex systems using autoregressive models on manifolds (mNARX). (arXiv:2306.16335v2 [stat.CO] UPDATED)
    We propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying exogenous excitations over extended time periods. Our approach, namely manifold nonlinear autoregressive modelling with exogenous input (mNARX), involves constructing a problem-specific exogenous input manifold that is optimal for constructing autoregressive surrogates. The manifold, which forms the core of mNARX, is constructed incrementally by incorporating the physics of the system, as well as prior expert- and domain- knowledge. Because mNARX decomposes the full problem into a series of smaller sub-problems, each with a lower complexity than the original, it scales well with the complexity of the problem, both in terms of training and evaluation costs of the final surrogate. Furthermore, mNARX synergizes well with traditional dimensionality reduction techniques, making it highly suitable for modelling dynamical systems with high-dimensional exogenous inputs, a class of problems that is typically challenging to solve. Since domain knowledge is particularly abundant in physical systems, such as those found in civil and mechanical engineering, mNARX is well suited for these applications. We demonstrate that mNARX outperforms traditional autoregressive surrogates in predicting the response of a classical coupled spring-mass system excited by a one-dimensional random excitation. Additionally, we show that mNARX is well suited for emulating very high-dimensional time- and state-dependent systems, even when affected by active controllers, by surrogating the dynamics of a realistic aero-servo-elastic onshore wind turbine simulator. In general, our results demonstrate that mNARX offers promising prospects for modelling complex dynamical systems, in terms of accuracy and efficiency.  ( 3 min )
    Leader-Follower Neural Networks with Local Error Signals Inspired by Complex Collectives. (arXiv:2310.07885v1 [cs.LG])
    The collective behavior of a network with heterogeneous, resource-limited information processing units (e.g., group of fish, flock of birds, or network of neurons) demonstrates high self-organization and complexity. These emergent properties arise from simple interaction rules where certain individuals can exhibit leadership-like behavior and influence the collective activity of the group. Motivated by the intricacy of these collectives, we propose a neural network (NN) architecture inspired by the rules observed in nature's collective ensembles. This NN structure contains workers that encompass one or more information processing units (e.g., neurons, filters, layers, or blocks of layers). Workers are either leaders or followers, and we train a leader-follower neural network (LFNN) by leveraging local error signals and optionally incorporating backpropagation (BP) and global loss. We investigate worker behavior and evaluate LFNNs through extensive experimentation. Our LFNNs trained with local error signals achieve significantly lower error rates than previous BP-free algorithms on MNIST and CIFAR-10 and even surpass BP-enabled baselines. In the case of ImageNet, our LFNN-l demonstrates superior scalability and outperforms previous BP-free algorithms by a significant margin.  ( 2 min )
    Towards Causal Deep Learning for Vulnerability Detection. (arXiv:2310.07958v1 [cs.SE])
    Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.  ( 2 min )
    D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning. (arXiv:2310.07931v1 [cs.LG])
    Analytical theories suggest that higher-quality data can lead to lower test errors in models trained on a fixed data budget. Moreover, a model can be trained on a lower compute budget without compromising performance if a dataset can be stripped of its redundancies. Coreset selection (or data pruning) seeks to select a subset of the training data so as to maximize the performance of models trained on this subset, also referred to as coreset. There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics. Optimizing for data diversity leads to a coreset that is biased towards easier samples, whereas, selection by difficulty ranking omits easy samples that are necessary for the training of deep learning models. This demonstrates that data diversity and importance scores are two complementary factors that need to be jointly considered during coreset selection. We represent a dataset as an undirected graph and propose a novel pruning algorithm, D2 Pruning, that uses forward and reverse message passing over this dataset graph for coreset selection. D2 Pruning updates the difficulty scores of each example by incorporating the difficulty of its neighboring examples in the dataset graph. Then, these updated difficulty scores direct a graph-based sampling method to select a coreset that encapsulates both diverse and difficult regions of the dataset space. We evaluate supervised and self-supervised versions of our method on various vision and language datasets. Results show that D2 Pruning improves coreset selection over previous state-of-the-art methods for up to 70% pruning rates. Additionally, we find that using D2 Pruning for filtering large multimodal datasets leads to increased diversity in the dataset and improved generalization of pretrained models.  ( 3 min )
    Promoting Robustness of Randomized Smoothing: Two Cost-Effective Approaches. (arXiv:2310.07780v1 [cs.LG])
    Randomized smoothing has recently attracted attentions in the field of adversarial robustness to provide provable robustness guarantees on smoothed neural network classifiers. However, existing works show that vanilla randomized smoothing usually does not provide good robustness performance and often requires (re)training techniques on the base classifier in order to boost the robustness of the resulting smoothed classifier. In this work, we propose two cost-effective approaches to boost the robustness of randomized smoothing while preserving its clean performance. The first approach introduces a new robust training method AdvMacerwhich combines adversarial training and robustness certification maximization for randomized smoothing. We show that AdvMacer can improve the robustness performance of randomized smoothing classifiers compared to SOTA baselines, while being 3x faster to train than MACER baseline. The second approach introduces a post-processing method EsbRS which greatly improves the robustness certificate based on building model ensembles. We explore different aspects of model ensembles that has not been studied by prior works and propose a novel design methodology to further improve robustness of the ensemble based on our theoretical analysis.  ( 2 min )
    First-Order Dynamic Optimization for Streaming Convex Costs. (arXiv:2310.07925v1 [math.OC])
    This paper proposes a set of novel optimization algorithms for solving a class of convex optimization problems with time-varying streaming cost function. We develop an approach to track the optimal solution with a bounded error. Unlike the existing results, our algorithm is executed only by using the first-order derivatives of the cost function which makes it computationally efficient for optimization with time-varying cost function. We compare our algorithms to the gradient descent algorithm and show why gradient descent is not an effective solution for optimization problems with time-varying cost. Several examples including solving a model predictive control problem cast as a convex optimization problem with a streaming time-varying cost function demonstrate our results.  ( 2 min )
    Local Graph Clustering with Noisy Labels. (arXiv:2310.08031v1 [cs.LG])
    The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.  ( 3 min )
    A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback. (arXiv:2301.13326v2 [cs.LG] UPDATED)
    We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm -- the offline algorithm can be used as a black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to diverse applications in submodular maximization. The new CMAB algorithms for submodular maximization with knapsack constraints outperform a full-bandit method developed for the adversarial setting in experiments with real-world data.  ( 3 min )
    Learning to Simulate Tree-Branch Dynamics for Manipulation. (arXiv:2306.03410v2 [cs.RO] UPDATED)
    We propose to use a simulation driven inverse inference approach to model the dynamics of tree branches under manipulation. Learning branch dynamics and gaining the ability to manipulate deformable vegetation can help with occlusion-prone tasks, such as fruit picking in dense foliage, as well as moving overhanging vines and branches for navigation in dense vegetation. The underlying deformable tree geometry is encapsulated as coarse spring abstractions executed on parallel, non-differentiable simulators. The implicit statistical model defined by the simulator, reference trajectories obtained by actively probing the ground truth, and the Bayesian formalism, together guide the spring parameter posterior density estimation. Our non-parametric inference algorithm, based on Stein Variational Gradient Descent, incorporates biologically motivated assumptions into the inference process as neural network driven learnt joint priors; moreover, it leverages the finite difference scheme for gradient approximations. Real and simulated experiments confirm that our model can predict deformation trajectories, quantify the estimation uncertainty, and it can perform better when base-lined against other inference algorithms, particularly from the Monte Carlo family. The model displays strong robustness properties in the presence of heteroscedastic sensor noise; furthermore, it can generalise to unseen grasp locations.  ( 2 min )
    A Transfer-Learning-Based Prognosis Prediction Paradigm that Bridges Data Distribution Shift across EMR Datasets. (arXiv:2310.07799v1 [cs.LG])
    Due to the limited information about emerging diseases, symptoms are hard to be noticed and recognized, so that the window for clinical intervention could be ignored. An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan, so to promptly prevent unfavorable outcomes. However, in the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference, to the extent that even data labels are difficult to mark correctly. In addition, Electronic Medical Record (EMR) data of different diseases or of different sources of the same disease can prove to be having serious cross-dataset feature misalignment problems, greatly mutilating the efficiency of deep learning models. This article introduces a transfer learning method to build a transition model from source dataset to target dataset. By way of constraining the distribution shift of features generated in disparate domains, domain-invariant features that are exclusively relative to downstream tasks are captured, so to cultivate a unified domain-invariant encoder across various task domains to achieve better feature representation. Experimental results of several target tasks demonstrate that our proposed model outperforms competing baseline methods and has higher rate of training convergence, especially in dealing with limited data amount. A multitude of experiences have proven the efficacy of our method to provide more accurate predictions concerning newly emergent pandemics and other diseases.  ( 3 min )
    Elastic Decision Transformer. (arXiv:2307.02484v5 [cs.LG] UPDATED)
    This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/  ( 2 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v4 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.  ( 2 min )
    GROOT: Learning to Follow Instructions by Watching Gameplay Videos. (arXiv:2310.08235v1 [cs.AI])
    We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while producing a video instruction encoder that induces a structured goal space. We implement our agent GROOT in a simple yet effective encoder-decoder architecture based on causal transformers. We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. Code and video can be found on the website https://craftjarvis-groot.github.io.  ( 2 min )
    XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation. (arXiv:2310.08182v1 [cs.CV])
    The lack of standardized robustness metrics and the widespread reliance on numerous unrelated benchmark datasets for testing have created a gap between academically validated robust models and their often problematic practical adoption. To address this, we introduce XIMAGENET-12, an explainable benchmark dataset with over 200K images and 15,600 manual semantic annotations. Covering 12 categories from ImageNet to represent objects commonly encountered in practical life and simulating six diverse scenarios, including overexposure, blurring, color changing, etc., we further propose a novel robustness criterion that extends beyond model generation ability assessment. This benchmark dataset, along with related code, is available at https://sites.google.com/view/ximagenet-12/home. Researchers and practitioners can leverage this resource to evaluate the robustness of their visual models under challenging conditions and ultimately benefit from the demands of practical computer vision systems.  ( 2 min )
    Samples on Thin Ice: Re-Evaluating Adversarial Pruning of Neural Networks. (arXiv:2310.08073v1 [cs.LG])
    Neural network pruning has shown to be an effective technique for reducing the network size, trading desirable properties like generalization and robustness to adversarial attacks for higher sparsity. Recent work has claimed that adversarial pruning methods can produce sparse networks while also preserving robustness to adversarial examples. In this work, we first re-evaluate three state-of-the-art adversarial pruning methods, showing that their robustness was indeed overestimated. We then compare pruned and dense versions of the same models, discovering that samples on thin ice, i.e., closer to the unpruned model's decision boundary, are typically misclassified after pruning. We conclude by discussing how this intuition may lead to designing more effective adversarial pruning methods in future work.  ( 2 min )
    Data driven modeling of self-similar dynamics. (arXiv:2310.08282v1 [cs.LG])
    Multiscale modeling of complex systems is crucial for understanding their intricacies. Data-driven multiscale modeling has emerged as a promising approach to tackle challenges associated with complex systems. On the other hand, self-similarity is prevalent in complex systems, hinting that large-scale complex systems can be modeled at a reduced cost. In this paper, we introduce a multiscale neural network framework that incorporates self-similarity as prior knowledge, facilitating the modeling of self-similar dynamical systems. For deterministic dynamics, our framework can discern whether the dynamics are self-similar. For uncertain dynamics, it can compare and determine which parameter set is closer to self-similarity. The framework allows us to extract scale-invariant kernels from the dynamics for modeling at any scale. Moreover, our method can identify the power law exponents in self-similar systems. Preliminary tests on the Ising model yielded critical exponents consistent with theoretical expectations, providing valuable insights for addressing critical phase transitions in non-equilibrium systems.  ( 2 min )
    To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer. (arXiv:2310.08078v1 [cs.CL])
    Choosing an appropriate tokenization scheme is often a bottleneck in low-resource cross-lingual transfer. To understand the downstream implications of text representation choices, we perform a comparative analysis on language models having diverse text representation modalities including 2 segmentation-based models (\texttt{BERT}, \texttt{mBERT}), 1 image-based model (\texttt{PIXEL}), and 1 character-level model (\texttt{CANINE}). First, we propose a scoring Language Quotient (LQ) metric capable of providing a weighted representation of both zero-shot and few-shot evaluation combined. Utilizing this metric, we perform experiments comprising 19 source languages and 133 target languages on three tasks (POS tagging, Dependency parsing, and NER). Our analysis reveals that image-based models excel in cross-lingual transfer when languages are closely related and share visually similar scripts. However, for tasks biased toward word meaning (POS, NER), segmentation-based models prove to be superior. Furthermore, in dependency parsing tasks where word relationships play a crucial role, models with their character-level focus, outperform others. Finally, we propose a recommendation scheme based on our findings to guide model selection according to task and language requirements.  ( 2 min )
    Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation. (arXiv:2310.08056v1 [cs.LG])
    Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.  ( 2 min )
    Overview of Physics-Informed Machine Learning Inversion of Geophysical Data. (arXiv:2310.08109v1 [physics.geo-ph])
    We review four types of algorithms for physics-informed machine learning (PIML) inversion of geophysical data. The unifying equation is given by the joint objective function $\epsilon$: \begin{eqnarray} \epsilon^{||-PIML}&=&\lambda_1 \overbrace{||{\bf W}^{ML}({\bf H}_{{\bf w}} {\bf d}^{obs}-{\bf m})||^2}^{NN} + \lambda_2 \overbrace{{||{\bf W}^{FWI}({\bf L} {\bf m}-{\bf d}^{obs})||^2}}^{FWI} ~+ \nonumber\\ \nonumber\\ && + ~~Regularizer, \label{PIML.eq120} \end{eqnarray}where the optimal model ${\bf m}^*$ and weights $\bf w^*$ minimize $\epsilon$. Here, The matrix weights are given by the boldface symbol $\bf W$, and full waveform inversion (FWI) is typically computed using a finite-difference solution of the wave equation, where $\bf L$ represents the forward modeling operation of the wave equation as a function of the model $\bf m$. Also, a fully-connected neural network (NN) is used to compute the model ${\bf H_w}{\bf d}^{obs} \approx \bf m$ from the observed input data ${\bf d}^{obs}$. The selection of weights $\lambda_i$ and the NN operations determine one of four different PIML algorithms. PIML offers potential advantages over standard FWI through its enhanced ability to avoid local minima and the option to locally train the inversion operator, minimizing the requirement for extensive training data for global applicability. However, the effectiveness of PIML relies on the similarity between the test and trained data. Nevertheless, a possible strategy to overcome this limitation involves initial pretraining of a PIML architecture with data from a broader region, followed by fine-tuning for specific data-a method reminiscent of the way large language models are pretrained and adapted for various tasks.  ( 2 min )
    LGL-BCI: A Lightweight Geometric Learning Framework for Motor Imagery-Based Brain-Computer Interfaces. (arXiv:2310.08051v1 [cs.LG])
    Brain-Computer Interfaces (BCIs) are a groundbreaking technology for interacting with external devices using brain signals. Despite advancements, electroencephalogram (EEG)-based Motor Imagery (MI) tasks face challenges like amplitude and phase variability, and complex spatial correlations, with a need for smaller model size and faster inference. This study introduces the LGL-BCI framework, employing a Geometric Deep Learning Framework for EEG processing in non-Euclidean metric spaces, particularly the Symmetric Positive Definite (SPD) Manifold space. LGL-BCI offers robust EEG data representation and captures spatial correlations. We propose an EEG channel selection solution via a feature decomposition algorithm to reduce SPD matrix dimensionality, with a lossless transformation boosting inference speed. Extensive experiments show LGL-BCI's superior accuracy and efficiency compared to current solutions, highlighting geometric deep learning's potential in MI-BCI applications. The efficiency, assessed on two public EEG datasets and two real-world EEG devices, significantly outperforms the state-of-the-art solution in accuracy ($82.54\%$ versus $62.22\%$) with fewer parameters (64.9M compared to 183.7M).  ( 2 min )
    SimCKP: Simple Contrastive Learning of Keyphrase Representations. (arXiv:2310.08221v1 [cs.CL])
    Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin.  ( 2 min )
    Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning. (arXiv:2210.13660v3 [cs.CR] UPDATED)
    Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual feasibility of the attack or the defense. Moreover, adversarial samples are often crafted in the "feature-space", making the corresponding evaluations of questionable value. Simply put, the current situation does not allow to estimate the actual threat posed by adversarial attacks, leading to a lack of secure ML systems. We aim to clarify such confusion in this paper. By considering the application of ML for Phishing Website Detection (PWD), we formalize the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD -- demonstrating that even perturbations in the "feature-space" are useful. Then, we propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers. After that, we perform the first statistically validated assessment of state-of-the-art ML-PWD against 12 evasion attacks. Our evaluation shows (i) the true efficacy of evasion attempts that are more likely to occur; and (ii) the impact of perturbations crafted in different evasion-spaces. Our realistic evasion attempts induce a statistically significant degradation (3-10% at p<0.05), and their cheap cost makes them a subtle threat. Notably, however, some ML-PWD are immune to our most realistic attacks (p=0.22). Finally, as an additional contribution of this journal publication, we are the first to consider the intriguing case wherein an attacker introduces perturbations in multiple evasion-spaces at the same time. These new results show that simultaneously applying perturbations in the problem- and feature-space can cause a drop in the detection rate from 0.95 to 0.  ( 3 min )
    PRiSM: Enhancing Low-Resource Document-Level Relation Extraction with Relation-Aware Score Calibration. (arXiv:2309.13869v1 [cs.CL] CROSS LISTED)
    Document-level relation extraction (DocRE) aims to extract relations of all entity pairs in a document. A key challenge in DocRE is the cost of annotating such data which requires intensive human effort. Thus, we investigate the case of DocRE in a low-resource setting, and we find that existing models trained on low data overestimate the NA ("no relation") label, causing limited performance. In this work, we approach the problem from a calibration perspective and propose PRiSM, which learns to adapt logits based on relation semantic information. We evaluate our method on three DocRE datasets and demonstrate that integrating existing models with PRiSM improves performance by as much as 26.38 F1 score, while the calibration error drops as much as 36 times when trained with about 3% of data. The code is publicly available at https://github.com/brightjade/PRiSM.  ( 2 min )
    Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs. (arXiv:2309.15395v2 [cs.LG] UPDATED)
    This paper considers the best policy identification (BPI) problem in online Constrained Markov Decision Processes (CMDPs). We are interested in algorithms that are model-free, have low regret, and identify an optimal policy with a high probability. Existing model-free algorithms for online CMDPs with sublinear regret and constraint violation do not provide any convergence guarantee to an optimal policy and provide only average performance guarantees when a policy is uniformly sampled at random from all previously used policies. In this paper, we develop a new algorithm, named Pruning-Refinement-Identification (PRI), based on a fundamental structural property of CMDPs we discover, called limited stochasticity. The property says for a CMDP with $N$ constraints, there exists an optimal policy with at most $N$ stochastic decisions. The proposed algorithm first identifies at which step and in which state a stochastic decision has to be taken and then fine-tunes the distributions of these stochastic decisions. PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs a near-optimal policy with a high probability at the end of learning; and (iii) in the tabular setting, PRI guarantees $\tilde{\mathcal{O}}(\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(K^{\frac{4}{5}})$ under a model-free algorithm, where $K$ is the total number of episodes.  ( 2 min )
    Rethinking the BERT-like Pretraining for DNA Sequences. (arXiv:2310.07644v2 [cs.AI] UPDATED)
    With the success of large-scale pretraining in NLP, there is an increasing trend of applying it to the domain of life sciences. In particular, pretraining methods based on DNA sequences have garnered growing attention due to their potential to capture generic information about genes. However, existing pretraining methods for DNA sequences largely rely on direct adoptions of BERT pretraining from NLP, lacking a comprehensive understanding and a specifically tailored approach. To address this research gap, we first conducted a series of exploratory experiments and gained several insightful observations: 1) In the fine-tuning phase of downstream tasks, when using K-mer overlapping tokenization instead of K-mer non-overlapping tokenization, both overlapping and non-overlapping pretraining weights show consistent performance improvement.2) During the pre-training process, using K-mer overlapping tokenization quickly produces clear K-mer embeddings and reduces the loss to a very low level, while using K-mer non-overlapping tokenization results in less distinct embeddings and continuously decreases the loss. 3) Using overlapping tokenization causes the self-attention in the intermediate layers of pre-trained models to tend to overly focus on certain tokens, reflecting that these layers are not adequately optimized. In summary, overlapping tokenization can benefit the fine-tuning of downstream tasks but leads to inadequate pretraining with fast convergence. To unleash the pretraining potential, we introduce a novel approach called RandomMask, which gradually increases the task difficulty of BERT-like pretraining by continuously expanding its mask boundary, forcing the model to learn more knowledge. RandomMask is simple but effective, achieving top-tier performance across 26 datasets of 28 datasets spanning 7 downstream tasks.  ( 3 min )
    Memorization Capacity of Multi-Head Attention in Transformers. (arXiv:2306.02010v2 [cs.LG] UPDATED)
    Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive. This paper investigates the memorization abilities of multi-head attention mechanisms, examining how many example sequences they can memorize, as a function of the number of heads and sequence length. Motivated by experimental findings on vision transformers, we introduce novel assumptions about the linear independence of input data, distinct from the commonly used general-position assumption. Under these assumptions, we demonstrate that an attention layer with $H$ heads, dimension $d$, and context size $n < d$, featuring $\Theta(Hd^2)$ parameters, can memorize $\Omega(Hn)$ examples. Our analysis sheds light on how different attention heads handle various example sequences, aided by the softmax operator's saturation property. We validate our findings through experiments on synthetic data.  ( 2 min )
    CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving. (arXiv:2310.07794v1 [cs.CV])
    Benchmarking is a common method for evaluating trajectory prediction models for autonomous driving. Existing benchmarks rely on datasets, which are biased towards more common scenarios, such as cruising, and distance-based metrics that are computed by averaging over all scenarios. Following such a regiment provides a little insight into the properties of the models both in terms of how well they can handle different scenarios and how admissible and diverse their outputs are. There exist a number of complementary metrics designed to measure the admissibility and diversity of trajectories, however, they suffer from biases, such as length of trajectories. In this paper, we propose a new benChmarking paRadIgm for evaluaTing trajEctoRy predIction Approaches (CRITERIA). Particularly, we propose 1) a method for extracting driving scenarios at varying levels of specificity according to the structure of the roads, models' performance, and data properties for fine-grained ranking of prediction models; 2) A set of new bias-free metrics for measuring diversity, by incorporating the characteristics of a given scenario, and admissibility, by considering the structure of roads and kinematic compliancy, motivated by real-world driving constraints. 3) Using the proposed benchmark, we conduct extensive experimentation on a representative set of the prediction models using the large scale Argoverse dataset. We show that the proposed benchmark can produce a more accurate ranking of the models and serve as a means of characterizing their behavior. We further present ablation studies to highlight contributions of different elements that are used to compute the proposed metrics.  ( 3 min )
    Dynamic Subgoal-based Exploration via Bayesian Optimization. (arXiv:1910.09143v5 [math.OC] UPDATED)
    Reinforcement learning in sparse-reward navigation environments with expensive and limited interactions is challenging and poses a need for effective exploration. Motivated by complex navigation tasks that require real-world training (when cheap simulators are not available), we consider an agent that faces an unknown distribution of environments and must decide on an exploration strategy. It may leverage a series of training environments to improve its policy before it is evaluated in a test environment drawn from the same environment distribution. Most existing approaches focus on fixed exploration strategies, while the few that view exploration as a meta-optimization problem tend to ignore the need for cost-efficient exploration. We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies. The algorithm adjusts a variety of levers -- the locations of the subgoals, the length of each episode, and the number of replications per trial -- in order to overcome the challenges of sparse rewards, expensive interactions, and noise. An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains. We also provide a theoretical foundation and prove that the method asymptotically identifies a near-optimal subgoal design.  ( 2 min )
    Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning. (arXiv:2301.10886v5 [cs.LG] UPDATED)
    We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.  ( 2 min )
    Refined Mechanism Design for Approximately Structured Priors via Active Regression. (arXiv:2310.07874v1 [cs.GT])
    We consider the problem of a revenue-maximizing seller with a large number of items $m$ for sale to $n$ strategic bidders, whose valuations are drawn independently from high-dimensional, unknown prior distributions. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute, and, even when they can be found, are often rife with various counter-intuitive properties. In this paper, following a model introduced recently by Cai and Daskalakis~\cite{cai2022recommender}, we consider the case that bidders' prior distributions can be well-approximated by a topic model. We design an active learning component, responsible for interacting with the bidders and outputting low-dimensional approximations of their types, and a mechanism design component, responsible for robustifying mechanisms for the low-dimensional model to work for the approximate types of the former component. On the active learning front, we cast our problem in the framework of Randomized Linear Algebra (RLA) for regression problems, allowing us to import several breakthrough results from that line of research, and adapt them to our setting. On the mechanism design front, we remove many restrictive assumptions of prior work on the type of access needed to the underlying distributions and the associated mechanisms. To the best of our knowledge, our work is the first to formulate connections between mechanism design, and RLA for active learning of regression problems, opening the door for further applications of randomized linear algebra primitives to mechanism design.  ( 3 min )
    SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection. (arXiv:2310.08040v1 [cs.LG])
    Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on features extracted from training data, rendering them less effective when the training and OoD data are highly overlapped in the feature space. In this work, we propose a Wasserstein-score-based generative adversarial training scheme to enhance OoD detection accuracy, which, for the first time, performs data augmentation and exploration simultaneously under the supervision of limited OoD samples. Specifically, the generator explores OoD spaces and generates synthetic OoD samples using feedback from the discriminator, while the discriminator exploits both the observed and synthesized samples for OoD detection using a predefined Wasserstein score. We provide theoretical guarantees that the optimal solutions of our generative scheme are statistically achievable through adversarial training in empirical settings. We then demonstrate that the proposed method outperforms state-of-the-art techniques on various computer vision datasets and exhibits superior generalizability to unseen OoD data.  ( 2 min )
    Open-Set Knowledge-Based Visual Question Answering with Inference Paths. (arXiv:2310.08148v1 [cs.LG])
    Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where the system is required to answer questions with entities at wild and retain an explainable reasoning path. To resolve the aforementioned issues, we propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity). Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process. To comprehensively evaluate our model, we reformulate the benchmark dataset OK-VQA with manually corrected entity-level annotations and release it as ConceptVQA. Extensive experiments on real-world questions demonstrate that our framework is not only able to perform open-set question answering across the whole knowledge base but provide explicit reasoning path.  ( 2 min )
    Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing. (arXiv:2306.06599v4 [cs.LG] UPDATED)
    Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at \url{https://github.com/Wang-ML-Lab/variational-imbalanced-regression}.  ( 2 min )
    A Generic Software Framework for Distributed Topological Analysis Pipelines. (arXiv:2310.08339v1 [cs.DC])
    This system paper presents a software framework for the support of topological analysis pipelines in a distributed-memory model. While several recent papers introduced topology-based approaches for distributed-memory environments, these were reporting experiments obtained with tailored, mono-algorithm implementations. In contrast, we describe in this paper a general-purpose, generic framework for topological analysis pipelines, i.e. a sequence of topological algorithms interacting together, possibly on distinct numbers of processes. Specifically, we instantiated our framework with the MPI model, within the Topology ToolKit (TTK). While developing this framework, we faced several algorithmic and software engineering challenges, which we document in this paper. We provide a taxonomy for the distributed-memory topological algorithms supported by TTK, depending on their communication needs and provide examples of hybrid MPI+thread parallelizations. Detailed performance analyses show that parallel efficiencies range from $20\%$ to $80\%$ (depending on the algorithms), and that the MPI-specific preconditioning introduced by our framework induces a negligible computation time overhead. We illustrate the new distributed-memory capabilities of TTK with an example of advanced analysis pipeline, combining multiple algorithms, run on the largest publicly available dataset we have found (120 billion vertices) on a standard cluster with 64 nodes (for a total of 1,536 cores). Finally, we provide a roadmap for the completion of TTK's MPI extension, along with generic recommendations for each algorithm communication category.  ( 3 min )
    Trustworthy Machine Learning. (arXiv:2310.08215v1 [cs.LG])
    As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machine learning technology. This textbook on Trustworthy Machine Learning (TML) covers a theoretical and technical background of four key topics in TML: Out-of-Distribution Generalization, Explainability, Uncertainty Quantification, and Evaluation of Trustworthiness. We discuss important classical and contemporary research papers of the aforementioned fields and uncover and connect their underlying intuitions. The book evolved from the homonymous course at the University of T\"ubingen, first offered in the Winter Semester of 2022/23. It is meant to be a stand-alone product accompanied by code snippets and various pointers to further sources on topics of TML. The dedicated website of the book is https://trustworthyml.io/.  ( 2 min )
    Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift. (arXiv:2310.08237v1 [stat.ML])
    Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.  ( 2 min )
    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics. (arXiv:2310.07990v1 [q-bio.GN])
    Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved r2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.  ( 3 min )
    Generative Modeling with Phase Stochastic Bridges. (arXiv:2310.07805v1 [cs.LG])
    Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.  ( 2 min )
    Optimizing Convolutional Neural Networks for Chronic Obstructive Pulmonary Disease Detection in Clinical Computed Tomography Imaging. (arXiv:2303.07189v3 [eess.IV] UPDATED)
    We aim to optimize the binary detection of Chronic Obstructive Pulmonary Disease (COPD) based on emphysema presence in the lung with convolutional neural networks (CNN) by exploring manually adjusted versus automated window-setting optimization (WSO) on computed tomography (CT) images. 7,194 CT images (3,597 with COPD; 3,597 healthy controls) from 78 subjects (43 with COPD; 35 healthy controls) were selected retrospectively (10.2018-12.2019) and preprocessed. For each image, intensity values were manually clipped to the emphysema window setting and a baseline 'full-range' window setting. Class-balanced train, validation, and test sets contained 3,392, 1,114, and 2,688 images. The network backbone was optimized by comparing various CNN architectures. Furthermore, automated WSO was implemented by adding a customized layer to the model. The image-level area under the Receiver Operating Characteristics curve (AUC) [lower, upper limit 95% confidence] was utilized to compare model variations. Repeated inference (n=7) on the test set showed that the DenseNet was the most efficient backbone and achieved a mean AUC of 0.80 [0.76, 0.85] without WSO. Comparably, with input images manually adjusted to the emphysema window, the DenseNet model predicted COPD with a mean AUC of 0.86 [0.82, 0.89]. By adding a customized WSO layer to the DenseNet, an optimal window in the proximity of the emphysema window setting was learned automatically, and a mean AUC of 0.82 [0.78, 0.86] was achieved. Detection of COPD with DenseNet models was improved by WSO of CT data to the emphysema window setting range.  ( 3 min )
    High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation. (arXiv:2304.02621v2 [cs.CV] UPDATED)
    Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset show that our proposed add-on is well-suited for large-scale settings. Our code is available at https://github.com/arvijj/hfpl.  ( 3 min )
    Differentially-Private Decision Trees and Provable Robustness to Data Poisoning. (arXiv:2305.15394v2 [cs.LG] UPDATED)
    Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support continuous features or leak information about them. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking information about numerical features. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we demonstrate bounds for the expected accuracy and success rates of backdoor attacks against differentially-private learners. By leveraging the better privacy-utility trade-off of PrivaTree we are able to train decision trees with significantly better robustness against backdoor attacks compared to regular decision trees and with meaningful theoretical guarantees.  ( 2 min )
    A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors. (arXiv:2310.08287v1 [stat.ML])
    The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.
    Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation. (arXiv:2310.08071v1 [cs.LG])
    Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions. At present, a surge of work focuses on designing deep interpretable methods with adequate data annotations and only a few methods consider the distributional shift problem. Most existing interpretable UDA methods are post-hoc ones, which cannot facilitate the model learning process for performance enhancement. In this paper, we propose an inherently interpretable method, named Transferable Conceptual Prototype Learning (TCPL), which could simultaneously interpret and improve the processes of knowledge transfer and decision-making in UDA. To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. With the learned transferable prototypes, a self-predictive consistent pseudo-label strategy that fuses confidence, predictions, and prototype information, is designed for selecting suitable target samples for pseudo annotations and gradually narrowing down the domain gap. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.  ( 2 min )
    Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks. (arXiv:2310.07979v1 [cs.LG])
    Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We look specifically at the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that can augment existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. We evaluate the performance of Graph-SCP on synthetic weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 30-70% and achieves run time speedups up to~25x when compared to commercial solvers (Gurobi). Given a desired optimality threshold, Graph-SCP will improve upon it or even achieve 100% optimality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial run time. Graph-SCP can generalize to larger problem sizes and can be used with other conventional or ML-augmented CO solvers to lead to potential additional run time improvement.  ( 2 min )
    Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach. (arXiv:2310.07970v1 [cs.LG])
    Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.  ( 2 min )
    Discerning Temporal Difference Learning. (arXiv:2310.08091v1 [cs.LG])
    Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction error into the historical context. However, this approach often neglects the significance of historical states and the relative importance of propagating the TD error, influenced by challenges such as visitation imbalance or outcome noise. To address this, we propose a novel TD algorithm named discerning TD learning (DTD), which allows flexible emphasis functions$-$predetermined or adapted during training$-$to allocate efforts effectively across states. We establish the convergence properties of our method within a specific class of emphasis functions and showcase its promising potential for adaptation to deep RL contexts. Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios.  ( 2 min )
    Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning. (arXiv:2310.07902v1 [cs.RO])
    In the realm of robotics, numerous downstream robotics tasks leverage machine learning methods for processing, modeling, or synthesizing data. Often, this data comprises variables that inherently carry geometric constraints, such as the unit-norm condition of quaternions representing rigid-body orientations or the positive definiteness of stiffness and manipulability ellipsoids. Handling such geometric constraints effectively requires the incorporation of tools from differential geometry into the formulation of machine learning methods. In this context, Riemannian manifolds emerge as a powerful mathematical framework to handle such geometric constraints. Nevertheless, their recent adoption in robot learning has been largely characterized by a mathematically-flawed simplification, hereinafter referred to as the ``single tangent space fallacy". This approach involves merely projecting the data of interest onto a single tangent (Euclidean) space, over which an off-the-shelf learning algorithm is applied. This paper provides a theoretical elucidation of various misconceptions surrounding this approach and offers experimental evidence of its shortcomings. Finally, it presents valuable insights to promote best practices when employing Riemannian geometry within robot learning applications.  ( 2 min )
    LEMON: Lossless model expansion. (arXiv:2310.07999v1 [cs.LG])
    Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.  ( 2 min )
    TabLib: A Dataset of 627M Tables with Context. (arXiv:2310.07875v1 [cs.CL])
    It is well-established that large, diverse datasets play a pivotal role in the performance of modern AI systems for text and image modalities. However, there are no datasets for tabular data of comparable size and diversity to those available for text and images. Thus we present "TabLib'', a compilation of 627 million tables totaling 69 TiB, along with 867B tokens of context. TabLib was extracted from numerous file formats, including CSV, HTML, SQLite, PDF, Excel, and others, sourced from GitHub and Common Crawl. The size and diversity of TabLib offer considerable promise in the table modality, reminiscent of the original promise of foundational datasets for text and images, such as The Pile and LAION.  ( 2 min )
    Enhanced sampling of Crystal Nucleation with Graph Representation Learnt Variables. (arXiv:2310.07927v1 [cond-mat.stat-mech])
    In this study, we present a graph neural network-based learning approach using an autoencoder setup to derive low-dimensional variables from features observed in experimental crystal structures. These variables are then biased in enhanced sampling to observe state-to-state transitions and reliable thermodynamic weights. Our approach uses simple convolution and pooling methods. To verify the effectiveness of our protocol, we examined the nucleation of various allotropes and polymorphs of iron and glycine from their molten states. Our graph latent variables when biased in well-tempered metadynamics consistently show transitions between states and achieve accurate free energy calculations in agreement with experiments, both of which are indicators of dependable sampling. This underscores the strength and promise of our graph neural net variables for improved sampling. The protocol shown here should be applicable for other systems and with other sampling methods.  ( 2 min )
    Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore. (arXiv:2310.07811v1 [cs.LG])
    We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.  ( 3 min )
    Large Language Models Are Zero-Shot Time Series Forecasters. (arXiv:2310.07820v1 [cs.LG])
    By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.  ( 2 min )
    ASV Station Keeping under Wind Disturbances using Neural Network Simulation Error Minimization Model Predictive Control. (arXiv:2310.07892v1 [cs.RO])
    Station keeping is an essential maneuver for Autonomous Surface Vehicles (ASVs), mainly when used in confined spaces, to carry out surveys that require the ASV to keep its position or in collaboration with other vehicles where the relative position has an impact over the mission. However, this maneuver can become challenging for classic feedback controllers due to the need for an accurate model of the ASV dynamics and the environmental disturbances. This work proposes a Model Predictive Controller using Neural Network Simulation Error Minimization (NNSEM-MPC) to accurately predict the dynamics of the ASV under wind disturbances. The performance of the proposed scheme under wind disturbances is tested and compared against other controllers in simulation, using the Robotics Operating System (ROS) and the multipurpose simulation environment Gazebo. A set of six tests were conducted by combining two wind speeds (3 m/s and 6 m/s) and three wind directions (0$^\circ$, 90$^\circ$, and 180$^\circ$). The simulation results clearly show the advantage of the NNSEM-MPC over the following methods: backstepping controller, sliding mode controller, simplified dynamics MPC (SD-MPC), neural ordinary differential equation MPC (NODE-MPC), and knowledge-based NODE MPC (KNODE-MPC). The proposed NNSEM-MPC approach performs better than the rest in 4 out of the 6 test conditions, and it is the second best in the 2 remaining test cases, reducing the mean position and heading error by at least 31\% and 46\% respectively across all the test cases. In terms of execution speed, the proposed NNSEM-MPC is at least 36\% faster than the rest of the MPC controllers. The field experiments on two different ASV platforms showed that ASVs can effectively keep the station utilizing the proposed method, with a position error as low as $1.68$ m and a heading error as low as $6.14^{\circ}$ within time windows of at least $150$s.  ( 3 min )
    NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration. (arXiv:2310.07896v1 [cs.RO])
    Robotic learning for navigation in unfamiliar environments needs to provide policies for both task-oriented navigation (i.e., reaching a goal that the robot has located), and task-agnostic exploration (i.e., searching for a goal in a novel setting). Typically, these roles are handled by separate models, for example by using subgoal proposals, planning, or separate navigation strategies. In this paper, we describe how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration, with the latter providing the ability to search novel environments, and the former providing the ability to reach a user-specified goal once it has been located. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments, as compared to approaches that use subgoal proposals from generative models, or prior methods based on latent variable models. We instantiate our method by using a large-scale Transformer-based policy trained on data from multiple ground robots, with a diffusion model decoder to flexibly handle both goal-conditioned and goal-agnostic navigation. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods, and demonstrate significant improvements in performance and lower collision rates, despite utilizing smaller models than state-of-the-art approaches. For more videos, code, and pre-trained model checkpoints, see https://general-navigation-models.github.io/nomad/  ( 2 min )
    RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization. (arXiv:2310.07983v1 [cs.LG])
    Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.  ( 2 min )
    A Review of Machine Learning Techniques in Imbalanced Data and Future Trends. (arXiv:2310.07917v1 [cs.LG])
    For over two decades, detecting rare events has been a challenging task among researchers in the data mining and machine learning domain. Real-life problems inspire researchers to navigate and further improve data processing and algorithmic approaches to achieve effective and computationally efficient methods for imbalanced learning. In this paper, we have collected and reviewed 258 peer-reviewed papers from archival journals and conference papers in an attempt to provide an in-depth review of various approaches in imbalanced learning from technical and application perspectives. This work aims to provide a structured review of methods used to address the problem of imbalanced data in various domains and create a general guideline for researchers in academia or industry who want to dive into the broad field of machine learning using large-scale imbalanced data.  ( 2 min )
    QArchSearch: A Scalable Quantum Architecture Search Package. (arXiv:2310.07858v1 [quant-ph])
    The current era of quantum computing has yielded several algorithms that promise high computational efficiency. While the algorithms are sound in theory and can provide potentially exponential speedup, there is little guidance on how to design proper quantum circuits to realize the appropriate unitary transformation to be applied to the input quantum state. In this paper, we present \texttt{QArchSearch}, an AI based quantum architecture search package with the \texttt{QTensor} library as a backend that provides a principled and automated approach to finding the best model given a task and input quantum state. We show that the search package is able to efficiently scale the search to large quantum circuits and enables the exploration of more complex models for different quantum applications. \texttt{QArchSearch} runs at scale and high efficiency on high-performance computing systems using a two-level parallelization scheme on both CPUs and GPUs, which has been demonstrated on the Polaris supercomputer.  ( 2 min )
    When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement. (arXiv:2310.07831v1 [cs.LG])
    Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.  ( 3 min )
    Feature Learning and Generalization in Deep Networks with Orthogonal Weights. (arXiv:2310.07765v1 [cs.LG])
    Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.  ( 2 min )
    Faithfulness Measurable Masked Language Models. (arXiv:2310.07819v1 [cs.CL])
    A common approach to explain NLP models, is to use importance measures that express which tokens are important for a prediction. Unfortunately, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues and existing solutions are computationally expensive and employ proxy-models. Furthermore, other metrics are very limited in scope. In this work, we propose an inherently faithfulness measurable model that addresses these challenges. This is achieved by using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to various tasks and validate it using statistical in-distribution tests. Additionally, because masking is in-distribution, importance measures which themselves use masking become more faithful, thus our model becomes more explainable.  ( 2 min )
    Using Spark Machine Learning Models to Perform Predictive Analysis on Flight Ticket Pricing Data. (arXiv:2310.07787v1 [cs.LG])
    This paper discusses predictive performance and processes undertaken on flight pricing data utilizing r2(r-square) and RMSE that leverages a large dataset, originally from Expedia.com, consisting of approximately 20 million records or 4.68 gigabytes. The project aims to determine the best models usable in the real world to predict airline ticket fares for non-stop flights across the US. Therefore, good generalization capability and optimized processing times are important measures for the model. We will discover key business insights utilizing feature importance and discuss the process and tools used for our analysis. Four regression machine learning algorithms were utilized: Random Forest, Gradient Boost Tree, Decision Tree, and Factorization Machines utilizing Cross Validator and Training Validator functions for assessing performance and generalization capability.  ( 2 min )
    Self-supervised Representation Learning From Random Data Projectors. (arXiv:2310.07756v1 [cs.LG])
    Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with application-specific data augmentation constraints. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. Due to its wide applicability and strong empirical results, we argue that learning from randomness is a fruitful research direction worthy of attention and further study.  ( 2 min )
    Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning. (arXiv:2310.07720v1 [cs.LG])
    Activation functions (AFs) are crucial components of deep neural networks (DNNs), having a significant impact on their performance. An activation function in a DNN is typically a smooth, nonlinear function that transforms an input signal into an output signal for the subsequent layer. In this paper, we propose the Parametric Leaky Tanh (PLTanh), a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU (LReLU) activation functions. PLTanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs, consistent with the behavior of LReLU. By integrating the unique advantages of these two diverse activation functions, PLTanh facilitates the learning of more intricate nonlinear relationships within the network. This paper presents an empirical evaluation of PLTanh against established activation functions, namely ReLU, LReLU, and ALReLU utilizing five diverse datasets.  ( 2 min )
    Visual Forecasting as a Mid-level Representation for Avoidance. (arXiv:2310.07724v1 [cs.RO])
    The challenge of navigation in environments with dynamic objects continues to be a central issue in the study of autonomous agents. While predictive methods hold promise, their reliance on precise state information makes them less practical for real-world implementation. This study presents visual forecasting as an innovative alternative. By introducing intuitive visual cues, this approach projects the future trajectories of dynamic objects to improve agent perception and enable anticipatory actions. Our research explores two distinct strategies for conveying predictive information through visual forecasting: (1) sequences of bounding boxes, and (2) augmented paths. To validate the proposed visual forecasting strategies, we initiate evaluations in simulated environments using the Unity engine and then extend these evaluations to real-world scenarios to assess both practicality and effectiveness. The results confirm the viability of visual forecasting as a promising solution for navigation and obstacle avoidance in dynamic environments.  ( 2 min )
  • Open

    Feature Learning and Generalization in Deep Networks with Orthogonal Weights. (arXiv:2310.07765v1 [cs.LG])
    Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.  ( 2 min )
    LEMON: Lossless model expansion. (arXiv:2310.07999v1 [cs.LG])
    Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.
    $L^1$ Estimation: On the Optimality of Linear Estimators. (arXiv:2309.09129v2 [math.ST] UPDATED)
    Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.
    A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors. (arXiv:2310.08287v1 [stat.ML])
    The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v9 [stat.ML] UPDATED)
    This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.
    RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization. (arXiv:2310.07983v1 [cs.LG])
    Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.  ( 2 min )
    Characterizing climate pathways using feature importance on echo state networks. (arXiv:2310.08495v1 [stat.ML])
    The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data-driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatio-temporal climate data. Like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we select a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which we use as a proxy for an artificial stratospheric aerosol injection. Using the proposed approach, we are able to characterize relationships between pathway variables associated with this event.  ( 3 min )
    Impact of multi-armed bandit strategies on deep recurrent reinforcement learning. (arXiv:2310.08331v1 [stat.ML])
    Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.  ( 2 min )
    Clustering Three-Way Data with Outliers. (arXiv:2310.05288v2 [stat.ML] UPDATED)
    Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.  ( 2 min )
    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning. (arXiv:2310.07918v1 [cs.LG])
    Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.  ( 3 min )
    Understanding Sparse Feature Updates in Deep Networks using Iterative Linearisation. (arXiv:2211.12345v4 [cs.LG] UPDATED)
    Larger and deeper networks generalise well despite their increased capacity to overfit. Understanding why this happens is theoretically and practically important. One recent approach looks at the infinitely wide limits of such networks and their corresponding kernels. However, these theoretical tools cannot fully explain finite networks as the empirical kernel changes significantly during gradient-descent-based training in contrast to infinite networks. In this work, we derive an iterative linearised training method as a novel empirical tool to further investigate this distinction, allowing us to control for sparse (i.e. infrequent) feature updates and quantify the frequency of feature learning needed to achieve comparable performance. We justify iterative linearisation as an interpolation between a finite analog of the infinite width regime, which does not learn features, and standard gradient descent training, which does. Informally, we also show that it is analogous to a damped version of the Gauss-Newton algorithm -- a second-order method. We show that in a variety of cases, iterative linearised training surprisingly performs on par with standard training, noting in particular how much less frequent feature learning is required to achieve comparable performance. We also show that feature learning is essential for good performance. Since such feature learning inevitably causes changes in the NTK kernel, we provide direct negative evidence for the NTK theory, which states the NTK kernel remains constant during training.  ( 3 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v4 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.  ( 2 min )
    A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks. (arXiv:2310.07891v1 [stat.ML])
    Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.  ( 2 min )
    Log-Gaussian Gamma Processes for Training Bayesian Neural Networks in Raman and CARS Spectroscopies. (arXiv:2310.08055v1 [stat.AP])
    We propose an approach utilizing gamma-distributed random variables, coupled with log-Gaussian modeling, to generate synthetic datasets suitable for training neural networks. This addresses the challenge of limited real observations in various applications. We apply this methodology to both Raman and coherent anti-Stokes Raman scattering (CARS) spectra, using experimental spectra to estimate gamma process parameters. Parameter estimation is performed using Markov chain Monte Carlo methods, yielding a full Bayesian posterior distribution for the model which can be sampled for synthetic data generation. Additionally, we model the additive and multiplicative background functions for Raman and CARS with Gaussian processes. We train two Bayesian neural networks to estimate parameters of the gamma process which can then be used to estimate the underlying Raman spectrum and simultaneously provide uncertainty through the estimation of parameters of a probability distribution. We apply the trained Bayesian neural networks to experimental Raman spectra of phthalocyanine blue, aniline black, naphthol red, and red 264 pigments and also to experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose. The results agree with deterministic point estimates for the underlying Raman and CARS spectral signatures.  ( 2 min )
    Learning to Act from Actionless Videos through Dense Correspondences. (arXiv:2310.08576v1 [cs.RO])
    In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that ``hallucinate'' robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day.  ( 2 min )
    A Complete Recipe for Diffusion Generative Models. (arXiv:2303.01748v2 [cs.LG] UPDATED)
    Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.  ( 2 min )
    Smoothed $f$-Divergence Distributionally Robust Optimization. (arXiv:2306.14041v2 [math.OC] UPDATED)
    In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA and other DRO approaches: It attains a statistical bound on the out-of-sample performance, for a wide class of objective functions and distributions, that is nearly tightest in terms of exponential decay rate. This DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L\'evy-Prokhorov (LP) distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized versions using smoothed $f$-divergence, are not harder than DRO problems based on $f$-divergence or Wasserstein distances, rendering our DRO formulations both statistically optimal and computationally viable.  ( 2 min )
    On Regularized Sparse Logistic Regression. (arXiv:2309.05925v2 [cs.LG] UPDATED)
    Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost.  ( 2 min )
    When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement. (arXiv:2310.07831v1 [cs.LG])
    Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.  ( 3 min )
    Differentially Private Non-convex Learning for Multi-layer Neural Networks. (arXiv:2310.08425v1 [cs.LG])
    This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.  ( 2 min )
    Conditional Sig-Wasserstein GANs for Time Series Generation. (arXiv:2006.05421v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have been extremely successful in generating samples, from seemingly high dimensional probability measures. However, these methods struggle to capture the temporal dependence of joint probability distributions induced by time-series data. Furthermore, long time-series data streams hugely increase the dimension of the target space, which may render generative modelling infeasible. To overcome these challenges, motivated by the autoregressive models in econometric, we are interested in the conditional distribution of future time series given the past information. We propose the generic conditional Sig-WGAN framework by integrating Wasserstein-GANs (WGANs) with mathematically principled and efficient path feature extraction called the signature of a path. The signature of a path is a graded sequence of statistics that provides a universal description for a stream of data, and its expected value characterises the law of the time-series model. In particular, we develop the conditional Sig-$W_1$ metric, that captures the conditional joint law of time series models, and use it as a discriminator. The signature feature space enables the explicit representation of the proposed discriminators which alleviates the need for expensive training. We validate our method on both synthetic and empirical dataset and observe that our method consistently and significantly outperforms state-of-the-art benchmarks with respect to measures of similarity and predictive ability.  ( 3 min )
    An interpretable neural network-based non-proportional odds model for ordinal regression. (arXiv:2303.17823v3 [stat.ME] UPDATED)
    This study proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) N$^3$POM is designed to directly handle continuous responses, whereas standard methods typically treat de facto ordered continuous variables as discrete, (2) instead of estimating response-dependent finite coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, N$^3$POM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply N$^3$POM to several real-world datasets.  ( 2 min )
    Generalization bounds for neural ordinary differential equations and deep residual networks. (arXiv:2305.06648v2 [stat.ML] UPDATED)
    Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields in particular a generalization bound for a class of deep residual networks. The bound involves the magnitude of the difference between successive weight matrices. We illustrate numerically how this quantity affects the generalization capability of neural networks.  ( 2 min )
    NECO: NEural Collapse Based Out-of-distribution detection. (arXiv:2310.06823v2 [stat.ML] UPDATED)
    Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. We plan to release the code after the anonymity period.  ( 2 min )
    Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing. (arXiv:2306.06599v4 [cs.LG] UPDATED)
    Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at \url{https://github.com/Wang-ML-Lab/variational-imbalanced-regression}.  ( 2 min )
    Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach. (arXiv:2310.07970v1 [cs.LG])
    Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.  ( 2 min )
    Robust 1-bit Compressed Sensing with Iterative Hard Thresholding. (arXiv:2310.08019v1 [cs.IT])
    In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.  ( 3 min )
    Efficient probabilistic reconciliation of forecasts for real-valued and count time series. (arXiv:2210.02286v3 [stat.ML] UPDATED)
    Hierarchical time series are common in several applied fields. The forecasts for these time series are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this paper, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a significant improvement over base probabilistic forecasts.  ( 2 min )
    Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v2 [stat.ML] UPDATED)
    In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.  ( 2 min )
    Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects. (arXiv:2310.08115v1 [econ.EM])
    Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notably, this property allows analysts to use the multiplier bootstrap to select covariates and models without sacrificing validity even if the true model is not included. Furthermore, if the conditional distributions are estimated at semiparametric rates, our approach matches the performance of an oracle with perfect knowledge of the outcome model. Finally, we propose an efficient computational framework, enabling implementation on many practical problems in causal inference.  ( 2 min )
    Generative modeling of time-dependent densities via optimal transport and projection pursuit. (arXiv:2304.09663v2 [stat.ML] UPDATED)
    Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.  ( 3 min )
    Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore. (arXiv:2310.07811v1 [cs.LG])
    We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.  ( 3 min )
    Conformal inference for regression on Riemannian Manifolds. (arXiv:2310.08209v1 [stat.ML])
    Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.  ( 2 min )
    On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks. (arXiv:2310.08150v1 [math.ST])
    Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.  ( 2 min )
    Local Graph Clustering with Noisy Labels. (arXiv:2310.08031v1 [cs.LG])
    The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.  ( 3 min )
    Evaluation of ChatGPT-Generated Medical Responses: A Systematic Review and Meta-Analysis. (arXiv:2310.08410v1 [stat.ME])
    Large language models such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in medicine and provide direction for future research. We searched ten medical literature databases on June 15, 2023, using the keyword "ChatGPT". A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. The analysis showed that ChatGPT displayed an overall integrated accuracy of 56% (95% CI: 51%-60%, I2 = 87%) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. Moreover, many studies failed to report methodological details, including the version of ChatGPT and whether each question was used independently or repeatedly. Our findings revealed that although ChatGPT demonstrated considerable potential for application in healthcare, the heterogeneity of the studies and insufficient reporting may affect the reliability of these results. Further well-designed studies with comprehensive and transparent reporting are needed to evaluate ChatGPT's performance in medicine.  ( 2 min )
    Extensions of Heterogeneity in Integration and Prediction (HIP) with R Shiny Application. (arXiv:2310.08426v1 [stat.ME])
    Multiple data views measured on the same set of participants is becoming more common and has the potential to deepen our understanding of many complex diseases by analyzing these different views simultaneously. Equally important, many of these complex diseases show evidence of subgroup heterogeneity (e.g., by sex or race). HIP (Heterogeneity in Integration and Prediction) is among the first methods proposed to integrate multiple data views while also accounting for subgroup heterogeneity to identify common and subgroup-specific markers of a particular disease. However, HIP is applicable to continuous outcomes and requires programming expertise by the user. Here we propose extensions to HIP that accommodate multi-class, Poisson, and Zero-Inflated Poisson outcomes while retaining the benefits of HIP. Additionally, we introduce an R Shiny application, accessible on shinyapps.io at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/, that provides an interface with the Python implementation of HIP to allow more researchers to use the method anywhere and on any device. We applied HIP to identify genes and proteins common and specific to males and females that are associated with exacerbation frequency. Although some of the identified genes and proteins show evidence of a relationship with chronic obstructive pulmonary disease (COPD) in existing literature, others may be candidates for future research investigating their relationship with COPD. We demonstrate the use of the Shiny application with a publicly available data. An R-package for HIP would be made available at https://github.com/lasandrall/HIP.  ( 3 min )
    Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts. (arXiv:2310.05898v2 [cs.LG] UPDATED)
    Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.  ( 3 min )
    Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining. (arXiv:2310.08566v1 [cs.LG])
    Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.  ( 2 min )
    Variable Selection for Kernel Two-Sample Tests. (arXiv:2302.07415v3 [stat.ML] UPDATED)
    We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to distinguish samples from two groups. To solve this problem, we propose a framework based on the kernel maximum mean discrepancy (MMD). Our approach seeks a group of variables with a pre-specified size that maximizes the variance-regularized MMD statistics. This formulation also corresponds to the minimization of asymptotic type-II error while controlling type-I error, as studied in the literature. We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees for different choices of kernel functions. Furthermore, we provide a statistical testing power analysis of our proposed framework. Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.  ( 2 min )
    Lattice real-time simulations with learned optimal kernels. (arXiv:2310.08053v1 [hep-lat])
    We present a simulation strategy for the real-time dynamics of quantum fields, inspired by reinforcement learning. It builds on the complex Langevin approach, which it amends with system specific prior information, a necessary prerequisite to overcome this exceptionally severe sign problem. The optimization process underlying our machine learning approach is made possible by deploying inherently stable solvers of the complex Langevin stochastic process and a novel optimality criterion derived from insight into so-called boundary terms. This conceptual and technical progress allows us to both significantly extend the range of real-time simulations in 1+1d scalar field theory beyond the state-of-the-art and to avoid discretization artifacts that plagued previous real-time field theory simulations. Limitations of and promising future directions are discussed.  ( 2 min )
    On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism. (arXiv:2310.07852v1 [stat.ML])
    We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.  ( 2 min )
    Learning Regularized Monotone Graphon Mean-Field Games. (arXiv:2310.08089v1 [cs.GT])
    This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $\lambda$-regularized GMFG (for $\lambda\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($\lambda=0$) and $\lambda$-regularized MFGs, which are special cases of GMFGs. Second, we propose provably efficient algorithms to learn the NE in weakly monotone GMFGs, motivated by Lasry and Lions [2007]. Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions. Furthermore, we develop and analyze the action-value function estimation procedure during the online learning process, which is absent from algorithms for monotone GMFGs. This serves as a sub-module in our optimization algorithm. The efficiency of the designed algorithm is corroborated by empirical evaluations.  ( 2 min )
    Efficient Integrators for Diffusion Generative Models. (arXiv:2310.07894v1 [cs.LG])
    Diffusion models suffer from slow sample generation at inference time. Therefore, developing a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, mapping the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.  ( 2 min )
    How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?. (arXiv:2310.08391v1 [stat.ML])
    Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a statistical task complexity bound for the attention model pretraining, showing that effective pretraining only requires a small number of independent tasks. Furthermore, we prove that the pretrained model closely matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by achieving nearly Bayes optimal risk on unseen tasks under a fixed context length. These theoretical findings complement prior experimental research and shed light on the statistical foundations of ICL.  ( 2 min )
    Personalised dynamic super learning: an application in predicting hemodiafiltration's convection volumes. (arXiv:2310.08479v1 [stat.ME])
    Obtaining continuously updated predictions is a major challenge for personalised medicine. Leveraging combinations of parametric regressions and machine learning approaches, the personalised online super learner (POSL) can achieve such dynamic and personalised predictions. We adapt POSL to predict a repeated continuous outcome dynamically and propose a new way to validate such personalised or dynamic prediction models. We illustrate its performance by predicting the convection volume of patients undergoing hemodiafiltration. POSL outperformed its candidate learners with respect to median absolute error, calibration-in-the-large, discrimination, and net benefit. We finally discuss the choices and challenges underlying the use of POSL.  ( 2 min )
    L2P: Learning to Place for Estimating Heavy-Tailed Distributed Outcomes. (arXiv:1908.04628v3 [cs.LG] UPDATED)
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.  ( 3 min )
    Statistical Performance Guarantee for Selecting Those Predicted to Benefit Most from Treatment. (arXiv:2310.07973v1 [stat.ME])
    Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals, called exceptional responders, who are likely to be helped by a treatment the most. A common approach consists of two steps. One first estimates the conditional average treatment effect or its proxy using an ML algorithm. They then determine the cutoff of the resulting treatment prioritization score to select those predicted to benefit most from the treatment. Unfortunately, these estimated treatment prioritization scores are often biased and noisy. Furthermore, utilizing the same data to both choose a cutoff value and estimate the average treatment effect among the selected individuals suffer from a multiple testing problem. To address these challenges, we develop a uniform confidence band for experimentally evaluating the sorted average treatment effect (GATES) among the individuals whose treatment prioritization score is at least as high as any given quantile value, regardless of how the quantile is chosen. This provides a statistical guarantee that the GATES for the selected subgroup exceeds a certain threshold. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units without requiring modeling assumptions or resampling methods. This widens its applicability including a wide range of other causal quantities. A simulation study shows that the empirical coverage of the proposed uniform confidence bands is close to the nominal coverage when the sample is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders with a statistical performance guarantee.  ( 3 min )
    Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift. (arXiv:2310.08237v1 [stat.ML])
    Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.  ( 2 min )
    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains. (arXiv:2310.07838v1 [cs.LG])
    We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{{|{\mathcal S}||{\mathcal A}|}/{n}}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory.  ( 2 min )

  • Open

    Savage Dall-e 3 delivers "Average reddit post"
    submitted by /u/Zimmax [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com Researchers present LLark: A Multimodal Foundation Model for Music - an open-source instruction-tuned multimodal model for music understanding. LLark is trained entirely from open-source music data and models [Demo | Paper] Researchers released LLaVA-1.5. LLaVA (Large Language and Vision Assistant) is an open-source large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. LLaVA-1.5 achieved SoTA on 11 benchmarks, with just simple modifications to the original LLaVA and completed training in ~1 day on a single 8-A100 node [Demo | Paper | GitHub]. Voice AI platform ElevenLabs released AI Dubbing tool that enables users to automatically translate any audio in a video into a different language whil…
    The AI Boom Could Use a Shocking Amount of Electricity
    The rapid growth of artificial intelligence (AI) could lead to a significant increase in global electricity consumption, according to a peer-reviewed analysis published in Joule. The analysis estimates that if current trends continue, AI could drive the demand for electricity in data centers to consume at least 85.4 terawatt-hours annually, which is more than what many small countries use in a year. AI is energy-intensive, with both the training and inference phases requiring a significant amount of energy. The size of AI models, such as large language models, and the location of data centers also contribute to energy usage. Factors such as cooling requirements and the type of hardware used can impact energy consumption. Source : https://www.scientificamerican.com/article/the-ai-boom-could-use-a-shocking-amount-of-electricity/ submitted by /u/NuseAI [link] [comments]
    Lemur: Harmonizing Natural Language and Code for Language Agents
    Today's conversational bots like Claude and GPT can chat impressively but aren't great at complex planning or executing technical tasks. To overcome this, new research from HKU builds open-source AI agents that blend natural language and coding skills. They're called Lemur and Lemur-Chat. The researchers think achieving versatile real-world agents requires models that integrate both fluid natural language abilities and precise programming language control. Humans combine plain speech for higher-level goals with languages like Python when we need to plan intricately and execute exactly. AI needs both capacities too. But most existing models specialize in pure language or pure code. There's a separation that is limiting. The team created Lemur by pretraining the open-source Llama-2 on a massive mixed corpus with 10x more natural language than code. This improved its programming abilities while retaining conversational strength. Further instruction tuning optimized Lemur-Chat for following free-form directions in language. Experiments found Lemur surpassed specialized coding-only models like Codex in overall benchmarks. Lemur-Chat then exceeded Lemur by 15% after instruction tuning. More importantly, Lemur-Chat won 12/13 new "agent tests" designed to mimic real-world challenges needing both language and programming prowess. It beat alternatives at: Using tools like Python and Wikipedia to enhance reasoning Debugging code by leveraging error messages Improving the most from natural language feedback Exploring partially observable environments like cybersecurity and web browsing simulations. Lemur-Chat matched GPT-3.5 in many tests, closing the gap between commercial and open-source agents. TLDR: New open-source AI agents combine coding and language skills. Experiments show the combo unlocks more performance across technical challenges. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Henry Kissinger: The Path to AI Arms Control
    submitted by /u/ForeignAffairsMag [link] [comments]
    A 21-year-old won $40,000 for using AI to read the first word on a 2,000-year-old papyrus scroll buried by Mount Vesuvius
    submitted by /u/thisisinsider [link] [comments]
    "Special Announcement: John Carmack & Rich Sutton partner to accelerate development of AGI" | "Carmack and Sutton are deeply focused on developing a genuine AI prototype by 2030, including establishing, advancing, and documenting AGI signs of life"
    submitted by /u/Tao_Dragon [link] [comments]
    Dumbing down or wising up: how will generative AI change the way we think?
    submitted by /u/Jariiari7 [link] [comments]
    One-Minute Daily AI News 10/13/2023
    In a recent article published in the journal Nature, researchers developed AI Tool EVEscape, a tool to forecast which severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains have the highest potential to escape host immunity.[1] Microsoft seems to be working on the possible development of an artificial intelligence (AI) system that can understand and resolve customer support requests using natural language processing.[2] Google’s Search Generative Experience (SGE) will let you create images right from a text prompt starting Thursday.[3] The Biden administration is considering closing a loophole that gives Chinese companies access to American artificial intelligence (AI) chips through units located overseas, according to four people familiar with the matter.[4] Sources: [1] https://www.news-medical.net/news/20231012/EVScape-New-tool-to-forecast-which-SARS-CoV-2-variants-could-dodge-our-immunity.aspx [2] https://winbuzzer.com/2023/10/11/microsoft-gears-up-for-a-revolutionary-natural-language-customer-support-ai-xcxwbn/ [3] https://www.theverge.com/2023/10/12/23913337/google-ai-powered-search-sge-images-written-drafts [4] https://www.reuters.com/technology/biden-eyes-adding-ai-chip-curbs-chinese-companies-abroad-2023-10-13/ submitted by /u/Excellent-Target-847 [link] [comments]
    I’ve created a audiobook generator anyone got any books to test on it? Each character is given a different voice.
    Also if anyone has anyone who should be a voice actor included in it it can also clone voices. Idk I need to make sure it works for a wide variety of books. As long as they don’t use ‘ for quotes cause the computer getts that confused when “ I’ve “ and such uses the same symbol submitted by /u/Impossible_Belt_7757 [link] [comments]
    Check out the latest episode of my history podcast on the future of A.I.!
    submitted by /u/ErikSlader713 [link] [comments]
    Drew a picture in paint, threw it in hotpot, and it came out a stylish, halloweenish picure. Damn this stuff is amazing.
    submitted by /u/kipaxbooks [link] [comments]
  • Open

    [P] App for iOS and M1 macOS for image bounding box annotation
    ClassifyML is an application for creating specialised image datasets for use with an ML training algorithm. Simply import your chosen images into the app via file manager, drag'n'drop or the on device camera and create your bounding boxes and then export your images and JSON into a structured folder. LINK: https://apps.apple.com/app/classify-ml/id6461013113 https://preview.redd.it/dicsq9d3k1ub1.png?width=313&format=png&auto=webp&s=7976a61f599c658d948dec12db0b8ec93274ad93 https://preview.redd.it/3tswxdd3k1ub1.png?width=313&format=png&auto=webp&s=56ca30546984402f4dbba628b73732918e921758 https://preview.redd.it/y0xelmz3k1ub1.png?width=313&format=png&auto=webp&s=a755ea61bc247c6aacb61a31c700e4e80a1ed69f submitted by /u/LiamRogers99 [link] [comments]  ( 9 min )
    [D] What are the best resources for learning reinforcement learning?
    Recently I came across Open AI's Spinning Up Project, which seems to be well structured, but quite introductory. What are some resources you use for learning RL? submitted by /u/OwnAd9305 [link] [comments]  ( 9 min )
    [D] LLM for entity/scene recognition in a book?
    Hello, I'm looking for an open source LLM that can extract all the characters from an inputted book, and isolate passages with descriptive writing that involves imagery. Can anyone suggest me something? Thanks! submitted by /u/slomorosh [link] [comments]  ( 9 min )
    [P] Deploy and Run LLMs at the Edge: Use Code Llama to Generate a Dashboard in a Network Restricted Environment
    In this blog, we explore different definitions of “the edge,” and understand the factors driving AI/ML to the edge. We examine why the trends of LLMs and edge computing are intersecting now, and how teams can take advantage of their combined power today. We also demonstrate how LLMs can be used in an edge environment to generate insights for a real-world use case today. Consider a geologist working in a remote oil field who is responsible for building and analyzing 3D models of oil fields to determine production capacity and the impact on profitability. In this demo, we walk through how Code Llama, Chassisml.io, and Modzy could be used to build a dashboard that geologists could use to analyze well data in real-time in a remote, network restricted environment, allowing for LLM insights generated at the edge. Learn more: https://www.modzy.com/modzy-blog/deploy-and-run-llms-at-the-edge submitted by /u/modzykirsten [link] [comments]  ( 9 min )
    [D] ICLR submissions are out. Discussion thread
    https://openreview.net/group?id=ICLR.cc/2024/Conference submitted by /u/_puhsu [link] [comments]  ( 8 min )
    [D] Vscode issue
    I am running AutoTokenizer from transformers on vscode. The vscode crashes showing error and not responding. I don't understand what's wrong. submitted by /u/ArtichokeOne5897 [link] [comments]  ( 8 min )
    "[P]" Utilizing Machine Learning Techniques for Document Digitalization Project
    Hey Guys, ​ I am currently spearheading a project for a client in the insurance industry, with a primary objective being the digitalization of thousands of hardcopy contracts. The ultimate goal is to automatically extract particular information from these newly digital documents, namely "date", "insurance premium", "insurance type", and "contractor's name". However, I anticipate a level of variability in terms of exact terminology used, particularly with regards to "insurance premium" and "insurance type". (There is no handwritten text) ​ I am keen on sharing the methodology I intend to apply for this project and invite your invaluable feedback and suggestions: ​ - Firstly, I'll execute the scanning/digitalization of the documents manually. - Post this, I plan to utilize Tesseract in combination with Python for the extraction of text from the preprocessed images. - I am considering using libraries such as NLTK or spaCy to preprocess this text (this will involve steps like lower casing, removing punctuations, etc.) - Finally, I plan to train a custom model for Named Entity Recognition (NER), to accommodate the potential semantic variations in entity labeling which are specific to entities like "insurance premium" and "insurance type". ​ I would be immensely grateful if I could gain your insights on the above-proposed pipeline - Are there any glaring pitfalls I need to avoid or perhaps some improvements that I could incorporate? Your expert advice can certainly help ensure the success of this venture. ​ Many thanks in anticipation for your time and valuable inputs! submitted by /u/Background_Thanks604 [link] [comments]  ( 9 min )
    [News] AI & ML conference in San Francisco [Special discount code for this subreddit]
    I work for this database company SingleStore and we are hosting a AI & ML conference in San Francisco on 17th of October, 2023. It is an in-person conference with amazing speakers line-up like Harrison Chase, co-founder and CEO of LangChain and many more. We will have hands-on workshops, swags giveaway and much more. I don't know if it makes sense to share this but I believe it might help some of you near San Francisco to go and meet the industry leaders and network with other data engineering folks. Use my discount coupon code 'PAVAN100OFF' to avail 100% off on the ticket price. (the original ticket price is $199) Get your tickets now! submitted by /u/PavanBelagatti [link] [comments]  ( 9 min )
    Using RAG on CoreML version of Llama2 [P]
    Has anyone ever attempted this or finetuning before on the CoreML version? I’m currently trying to and I’m not even sure where to start tbh. CoreML version of Llama 2: https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml submitted by /u/Inside-Aromatic [link] [comments]  ( 9 min )
    [D] How does L1 Regularization able to drive a coefficient to zero?
    Hi all, I’m studying the concepts of machine learning. However, I am stuck because I still don’t see how introducing a penalty using lasso regression can drive some parameter coefficients to zero. When doing the calculations, I only get the final value (ordinary least squares + penalty) and don’t directly see a coefficient value being reduced. I've looked at many materials and resources trying to explain this, but I still can't see how it's done. I think the important thing for me is seeing it going to zero or, at the very least, seeing it during calculation. Is there anyone that can help explain this better? Or, If you know of a formula that I can derive that, during the derivation process, shows a coefficient being reduced or set to zero, that would also help. Also, any good resources on the topic would be appreciated. Edit: This post should have been posted in r/learnmachinelearning here is a link to the same post in that subreddit submitted by /u/thismymind [link] [comments]  ( 9 min )
    [D] How do you pre-pay OpenaAI compute credit with university funds ?
    I am an academic and I have some funding. However, I cannot just plug in my lab card with a recurrent payment, procedures don't allow it. Is there a way to "top up" some compute credits on the OpenAI accounts ? Is anyone having the same problem ? Thanks. submitted by /u/Jean-Porte [link] [comments]  ( 9 min )
    [R] Seeking Guidance on Efficiently Classifying and Cleansing Automotive Data with Python
    Hi, we are working on a project that involves dealing with messy automotive data, and are looking for guidance on possible approaches and tools. We aim to map messy supplier data of car makes/models to standardized values from our approved list. This requires handling various challenges like typos, varied specificity, and sometimes research-based mapping (e.g., using engine size and production year to ascertain a chassis code). eg: If a supplier provides 'BNW 316i saloon 1990-1994', (typo intentional) we would like to match it to our standardized value of 'BMW 3 Series (E36)'. Our old approach has been a combination of utilizing fuzzy matching for typos/basic matching and time consuming manual processing and verification. We have recently experimented with using GPT for providing guess…  ( 10 min )
    [R] Lemur: Harmonizing Natural Language and Code for Language Agents
    Today's conversational bots like Claude and GPT can chat impressively but aren't great at complex planning or executing technical tasks. To overcome this, new research from HKU builds open-source AI agents that blend natural language and coding skills. They're called Lemur and Lemur-Chat. The researchers think achieving versatile real-world agents requires models that integrate both fluid natural language abilities and precise programming language control. Humans combine plain speech for higher-level goals with languages like Python when we need to plan intricately and execute exactly. AI needs both capacities too. But most existing models specialize in pure language or pure code. There's a separation that is limiting. The team created Lemur by pretraining the open-source Llama-2 on a massive mixed corpus with 10x more natural language than code. This improved its programming abilities while retaining conversational strength. Further instruction tuning optimized Lemur-Chat for following free-form directions in language. Experiments found Lemur surpassed specialized coding-only models like Codex in overall benchmarks. Lemur-Chat then exceeded Lemur by 15% after instruction tuning. More importantly, Lemur-Chat won 12/13 new "agent tests" designed to mimic real-world challenges needing both language and programming prowess. It beat alternatives at: Using tools like Python and Wikipedia to enhance reasoning Debugging code by leveraging error messages Improving the most from natural language feedback Exploring partially observable environments like cybersecurity and web browsing simulations. Lemur-Chat matched GPT-3.5 in many tests, closing the gap between commercial and open-source agents. TLDR: New open-source AI agents combine coding and language skills. Experiments show the combo unlocks more performance across technical challenges. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Introducing PPO and Rainbow DQN to our super fast evolutionary HPO reinforcement learning framework
    Hi, we've just released a new version of AgileRL, our evolutionary hyperparameter optimisation framework built for RL that is 10x faster than SOTA. We've introduced PPO, Rainbow DQN, some sophisticated replay buffers, and also collaborated with the Farama Foundation to create some tutorials (more on the way). Please check it out and take it for a spin. We're also looking for contributors so get in touch if you would like to be involved! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [P] Free open-source ML observability course: starts October 16 🚀
    Hi everyone, I’m one of the creators of Evidently, an open-source (Apache 2.0) tool for production ML monitoring. We’ve just launched a free open course on ML observability that I wanted to share with the community. The course covers: 📚 Key concepts of ML monitoring and observability (data drift, data and model quality metrics, etc.) 🔡 Monitoring unstructured data (embeddings, texts, LLMs, etc.) 🛠 Different deployment architectures (batch ML monitoring jobs, near real-time ML monitoring, etc.) The course is free and open. All materials are public, with no sign-up required. You’ll work with open-source tools like Evidently, MLflow, Airflow, and Grafana. We’ve already published the first 12 videos with notes and code examples. We’ll add new lessons and deployment blueprints over the following weeks. The official course start date is October 16, 2023. You can also learn at your own pace. Course info and notes: https://learn.evidentlyai.com/ [Background] We’ve been working on Evidently since late 2020 and have spoken to 100s of data scientists, ML engineers, and ML platform teams in different industries. In this course, we tried to sum up answers to the frequent questions on the topic. It starts with high-level theoretical modules and goes to complete deployment blueprints. It is approachable for different levels of knowledge, and you can pick only the modules you are interested in. Looking forward to meeting you at the course! submitted by /u/mllena [link] [comments]  ( 9 min )
    Can I use ArcPro to do machine learning on point (numeric) data? [D] [R]
    I am trying to do machine learning in ArcPro, and I want to understand the relationship between x, y, numeric variable 1, numeric variable 2, and one nominal variable (classified; i.e. can be one of four values). I'd like to be able to predict numeric variable 1 based on everything else. Can ArcPro accommodate machine learning for anything other than raster type data. That is, can it be used to do machine learning on point (numeric) data? Thanks! submitted by /u/arcgis_123 [link] [comments]  ( 9 min )
    [R] TimeGPT : The first Generative Pretrained Transformer for Time-Series Forecasting
    In 2023, Transformers made significant breakthroughs in time-series forecasting For example, earlier this year, Zalando proved that scaling laws apply in time-series as well. Providing you have large datasets ( And yes, 100,000 time series of M4 are not enough - smallest 7B Llama was trained on 1 trillion tokens! ) Nixtla curated a 100B dataset of time-series and trained TimeGPT, the first foundation model on time-series. The results are unlike anything we have seen so far. I published the results in my latest article. I hope the research will be insightful for people who work on time-series projects. Link: https://aihorizonforecast.substack.com/p/timegpt-the-first-foundation-model Note: If you know any other good resources on very large benchmarks for time series models, feel free to add them below. ​ submitted by /u/nkafr [link] [comments]
    [R] Pointers to (deep) latent variable models that admit analytical approximations
    Hi everyone. I am aware that there is a plethora of deep generative models out there (e.g. variational autoencoders (VAE), GANs) that can model high-dimensional data as the images of latent variables under a non-linear mapping (typically neural network). In more traditional methods such as probabilistic PCA, the latent variables can be marginalised analytically. In Bayesian PCA (BPCA), we can additionally integrate out the linear mapping, from the latent space to the observation space, by adopting the variational lower bound that leads to closed form updates of the parameters. The Gaussian Process Latent Variable (GPLVM) model adopts a non-linear probabilistic mapping (a Gaussian process) that can be marginalised. These two models enjoy to a certain degree analytical solutions concerning the inference of the latent variables and the mapping. I have been wondering whether there is any research into more "complex" models (perhaps I should call them deep) that are capable of modelling more complex data distributions than the GPVLM and BPCA, but retain analytical solutions when inferring the posterior of the latent variables (like BPCA) or the mapping (like GPLVM)? What I like about the GPLVM and BPCA is that they possess an objective function (i.e. ELBO) that can be analytically optimised, as opposed to the intractable objective of VAEs that necessitates Monte-Carlo averages and stochastic gradient. Could somebody please point me to such examples of more complex generative models that admit analytical inference for working out the posterior of the latent variables or the mapping? ----- This has also been posted on stack exchange: https://ai.stackexchange.com/q/42418/61537 submitted by /u/ngiann [link] [comments]  ( 9 min )
    [D] I love teaching! But I don't have enough publication for it, what should I do?
    Do I love teaching? Oh, absolutely, YES a big YES! My time as a TA for countless semesters has been amazing. Staying after hours, spending long evenings and early mornings, to make each of my students find ease in debugging both easy-peasy and mind-boggling programs – it’s been a joy, truly. Watching those fresh faces, whom I introduced to Python in their first year ( intro to programming lab), now immerse themselves into my computer vision labs, exploring computer vision and deep learning in their third/forth year – it’s incredibly rewarding! And yeah my students kind of like me! after each semester I get tons of emails thanking me and my TAship review is always good. But, ugh, do I have enough publications to become faculty? A big fat NO! My efforts have been relentless, and everyone in my department would nod in agreement. But luck and reviewers? Not my best pals, apparently. So yeah, I don’t have a stack of 8 top-tier papers. I’ve managed to scrape together 3, and a few second tiers. My citation count is not that bad somewhere between 200 and 300-ish. Now, what’s next for me? Dive into the industry? become a high school teacher? Or perhaps, do a postdoc journey, fingers crossed for a sprinkle more luck and few more papers? Edit: This doesn't mean I don't like research, I actually love it too, I have done quite a few internship in quite big companies, most of the time they extend my intership and I even got publication out of one in 5 month. But I just like to teach a lot! strangely I got social anxiety every where other than my classrooms/labs. submitted by /u/LongjumpingSchool646 [link] [comments]  ( 9 min )
    [D] You don't need a Vector Database you just need a database
    I'm seeing some architectures come out from the LLM world that probably wouldn't survive the trip to production. If you choose a vector database how will you handle your other database needs? Then you'll need 2 databases. https://bionic-gpt.com/blog/you-dont-need-a-vector-database/ submitted by /u/purton_i [link] [comments]  ( 9 min )
    [D] Why back-propagation is intractable of MoCO key encoder?
    In the original paper of MoCo, it said that: Using a queue can make the dictionary large, but it also makes it intractable to update the key encoder by back-propagation (the gradient should propagate to all samples in the queue). First I thought that the main reason that the bp cannot imply on key encoder is that the queue operation is not differentable. But It seems not true. You can compute the gradient of all samples in the queue, then bp should be performed properly. See the code at the bottom. So WHAT IS THE REAL REASON THAT THE BP IS INTRACTABLE FOR KEY ENCODER? In my opinion, I think may be because of the large size of the queue (dictionary) which makes the memory explosive. python q = nn.Linear(768,128) k = nn.Linear(768,128) bs = 64 ks = 4095 model = nn.ModuleList([q,k]) x = torch.randn(bs, 768) optim = torch.optim.SGD(model.parameters(),lr=0.01) loss = nn.CrossEntropyLoss() def forward(x): xq = q(x) xk = k(x + 0.1) que = torch.rand(ks,128) pos = torch.einsum("nc,nc->n",xq,xk) neg = torch.einsum("nc,kc->nk",xq,que) out = torch.cat([pos.unsqueeze(-1),neg],dim=1) t = torch.zeros(out.shape[0],dtype=torch.long) l = loss(out,t) return l loss = forward(x) loss.backward() optim.step() submitted by /u/whishtLF [link] [comments]  ( 9 min )
    [D] Advisor rejects every idea I propose.
    A senior phd student at a moderately famous university. I have a reasonable number of accepted papers as first author in tier-1 conferences. I was thinking of going into academia, so recently I started proposing many ideas to my advisor so that I can mentor some junior students. However my advisor is rejecting every idea I suggest saying it won’t work. I’m feeling very dejected and I feel like I should give up going into academia. I don’t know what I’m expecting from here. Is your advisor like this too? submitted by /u/mildlyphd [link] [comments]  ( 9 min )
  • Open

    Batch calibration: Rethinking calibration for in-context learning and prompt engineering
    Posted by Han Zhou, Student Researcher, and Subhrajit Roy, Senior Research Scientist, Google Research Prompting large language models (LLMs) has become an efficient learning paradigm for adapting LLMs to a new task by conditioning on human-designed instructions. The remarkable in-context learning (ICL) ability of LLMs also leads to efficient few-shot learners that can generalize from few-shot input-label pairs. However, the predictions of LLMs are highly sensitive and even biased to the choice of templates, label spaces (such as yes/no, true/false, correct/incorrect), and demonstration examples, resulting in unexpected performance degradation and barriers for pursuing robust LLM applications. To address this problem, calibration methods have been developed to mitigate the effects of t…  ( 93 min )
  • Open

    Significance of AI in the development of software products
    Artificial Intelligence (AI) is emerging as a formidable force, revolutionizing how we conceive, create, and deliver software solutions. As technology advances at an unprecedented pace, the role of AI in this domain has become increasingly significant. It’s no longer just a buzzword; it’s a fundamental tool that promises to reshape the entire software development process.… Read More »Significance of AI in the development of software products The post Significance of AI in the development of software products appeared first on Data Science Central.  ( 19 min )
    Future of AI and data science – How to secure a bright career
    Companies, more often, pay attention to automation and innovation over proficiency and productivity. However, firms can maintain a balance between both due to the extensive usage of AI and data science programs. Here are the stats that show the impact of AI and data science in diverse sectors: Applications of AI and data science have… Read More »Future of AI and data science – How to secure a bright career The post Future of AI and data science – How to secure a bright career appeared first on Data Science Central.  ( 21 min )
  • Open

    A question
    What are the ways to create plasticity in neural network? Without using weights,bias and activation functions? submitted by /u/Sith_vader3 [link] [comments]  ( 8 min )
    Neural Networks project
    Hi ! My group (4 people) has chosen to make an application that translates ancient stone inscriptions to modern languages as our university project . We can use external libraries to process images that we are going to translate but as we understood we have to build the neural network ourselves from scratch. My questions are 1) is this possible to do within 10 months? 2) if so how would you approach it ? submitted by /u/sakith123 [link] [comments]
  • Open

    From Skylines to Streetscapes: How SHoP Architects Brings Innovative Designs to Life
    At SHoP Architects, a New York City-based architectural firm, Mengyi Fan and her team aim to inspire industry professionals to create visual masterpieces by incorporating emerging technologies. Fan, the director of visualization at SHoP, has expertise that spans the fields of architectural visualization and design. She takes a definitive, novel and enduring approach to designing Read article >  ( 6 min )
  • Open

    Introducing PPO and Rainbow DQN to our super fast evolutionary HPO reinforcement learning framework
    Hi, we've just released a new version of AgileRL, our evolutionary hyperparameter optimisation framework built for RL that is 10x faster than SOTA. We've introduced PPO, Rainbow DQN, some sophisticated replay buffers, and also collaborated with the Farama Foundation to create some tutorials (more on the way). Please check it out and take it for a spin. We're also looking for contributors so get in touch if you would like to be involved! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]
    Masking state transitions in policy updates for invalid actions?
    I am currently dealing with an environment, that most of the time (90% of all state transitions) clips the action selected from the agent. Sometimes even down to the point where the action selected by the agent is completly ignored. This causes a lot of problems, because for example the entropy bonus does not works, since the agent learns to select any action, when it doesn't matter anyway but selects the same action (low entropy) when the actions have an effect. Using the PPO algorithm I was thinking of masking the state transitions in the policy updates, according to how much the action was clipped in the environment. And I thought V(s) should be masked, because it can still learn from the state transitions even if the action was effectively ignored by the environment. submitted by /u/flxh13 [link] [comments]
    A question about deterministic action selection at evaluation time
    I'm training some agents using fairly vanilla PPO on a hand-made environment. These agents learn to perform the task pretty well, but while I was examining their action probabilities during an evaluation episode, I had the idea to turn off deterministic action selection. To my surprise, allowing probabilistic action selection (as opposed to argmax action selection) actually improved performance in some cases. I had always thought that deterministic actions during evaluation was fairly standard, but now am thinking that maybe I missed something and that there are cases where you wouldn't want determinism? My question is: how common is it actually to use deterministic actions vs. probabilistic ones at evaluation time, and does anyone know of studies/papers/examples where the authors found probabilistic evaluation to outperform determinism? submitted by /u/Impallion [link] [comments]
    "A Simple Open-Loop Baseline for Reinforcement Learning Locomotion Tasks" Raffin et al. 2023
    submitted by /u/atooo57 [link] [comments]
    Looking for some advice regarding universal multi-head outputs
    Hey, So I am working on reinforcement learning package in C# (currently under heavy development): https://github.com/asieradzk/RL_Matrix My goal is to create something superior to unity's ML Agents for Godot to democratize access to reinforcement learning for people (without having them know what a tensor is) So far I've added some barebones DQN and PPO that (only output single discrete action) as proof of concept to test my code architecture. So I am going through the daunting task of having some universal workflow for setting up environments. For any shape observations and any count actions, both discrete and continuous. As I am finishing my multi-head multi-action output I've come to realise that there are many possible architectures I could setup multi head outputs, for instanc…
    Next state in turn based game
    To my knowledge, when using the Q Learning family algorithm, we must know the next state as well as the action spaces in couple with that observation in order to evaluate the reward for the next state with the target network. But I have some problem when trying to define this next state in turn turn-based game in which the agent have to make a certain number of actions and then wait for the opponent to do some actions before it can interact with the environment again. We can take Hearthstone as an example that each player have to wait for other to play a number of cards before can take any action. Currently, I have two options for this: - Treat the environment right after the agent's turn ended, which will lack the action space. - Treat the environment just before the agent's turn begins, which will have all the actions available that it can choose from but this will make the agent's last action very noisy. That state could be a good state if the opponent playing badly or they are very good and make our last decision seem like a very bad choice. Thanks in advance for any suggestions. If my problem is a common task that others have already solved many times, I will be very thankful for that keyword. submitted by /u/No-Concentrate-6037 [link] [comments]
    "Small batch deep reinforcement learning", Obando-Ceron et al 2023 {DM} (value-based agents explore & regularize better with small n)
    submitted by /u/gwern [link] [comments]

  • Open

    How are memories stored in neural networks? | The Hopfield Network #SoME2
    submitted by /u/keghn [link] [comments]
    A question
    How does the neural network process input that were same but shown different to the network model? submitted by /u/Sith_vader3 [link] [comments]
    I don't much about NN's. is this correct ?
    i gave chatgpt vision an illustration of neural network from The Principles of Deep Learning Theory. what to know how correct its reponse is here is the response: https://preview.redd.it/inqe5xukxptb1.png?width=453&format=png&auto=webp&s=6e1079baeae8235b0e03a677e4006d1077af36a8 submitted by /u/YeshwanthRam [link] [comments]
  • Open

    Who Will Benefit from AI?
    Artificial intelligence (AI) can provide "machine usefulness" for human workers, augmenting their jobs rather than replacing them. However, there is a concern that AI could lead to job displacement and reinforce economic inequality. MIT economist Daron Acemoglu emphasizes the importance of making AI more useful to humans and ensuring that the economic benefits are shared widely. He suggests that innovations that augment workers' tasks can lead to prosperity for the workforce. Acemoglu also highlights the need for worker power and the careful implementation of technology to achieve shared prosperity and productivity gains. Source : https://idss.mit.edu/news/who-will-benefit-from-ai/ submitted by /u/NuseAI [link] [comments]
    What's the most advanced free chatbot available?
    I just need three things for it: It must be knowledgeable about things, such as physics, math, hystory, books, geography, etc. It also must be original, with a high level of SEO and AI detection score. It must be available in Italy. The last part is essential. Claude 2 is very famous but with sms verification from usa (which I don't have and I don't want to give credit card info/pay to have) it's made almost impossible even with vpn. submitted by /u/luigirovatti1 [link] [comments]
    10 Powerful ChatGPT Hacks for SEO
    submitted by /u/Senior_tasteey [link] [comments]
    ChatGPT's Global Peace Plan
    Creating true, enduring, lasting peace on Earth is an ambitious and complex endeavor that requires multifaceted approaches. Here’s a bold, outside-the-box plan that may surprise you: Step 1: Establish a Global Consciousness: Educational Overhaul: Revamp global educational systems to foster empathy, understanding, and appreciation for diverse cultures, religions, and viewpoints from a young age. Step 2: Eradicate Poverty and Inequality: Universal Basic Assets (UBA): Implement a Universal Basic Assets program, where every person on Earth is granted a share of global resources. Step 3: Create a Single Global Governance Entity: World Federation: Establish a democratically elected World Federation that respects regional autonomy but has overriding authority on global issues like…
    When your AI says she loves you
    submitted by /u/thisisinsider [link] [comments]
    Anyone ever thought about training a video generating model, but backwards?
    Just had a random idea: What if you train a video generating AI, but feed it videos that are reversed? You could show it an image of a crashed car, and it would generate a video of the crash. Show it a broken vase, it would "repair" it. It could one day become like the "reconstruct crime scene" in Detroit: Become Human. What are your thoughts about this? submitted by /u/FluffyIllustrator805 [link] [comments]
    AI and science: what 1,600 researchers think
    A Nature survey of over 1,600 researchers reveals that AI tools are becoming increasingly common in science and are expected to be 'very important' or 'essential' in the next decade. Scientists express concerns about how AI is transforming research, including reliance on pattern recognition without understanding, bias in data, fraud, and irreproducible research. The survey shows that AI tools provide faster ways to process data, speed up computations, and save time and money. Among researchers who use AI, more than one-quarter believe AI tools will become 'essential' to their field in the next decade. Large language models like ChatGPT are mentioned as both impressive and concerning examples of AI tools in science. Source : https://www.nature.com/articles/d41586-023-02980-0 submitted by /u/NuseAI [link] [comments]
    Looking for AI text input like Artbreeder Mixer that combines images
    I'm looking for a (free) ai image generator like Artbreeder Mixer, that has functions that allow you to "morph" or mix images together via text prompts. Ive looked at a bunch already, and even tried adding the text of the different types in the prompts, bu Ive been getting separated results (like "cat" , "man", "head" wont combine the man and the cat, but rather give me un-morphed results, like a regular man, plus a cat in a suit with no human features. I even get a result with a man standing behind a cat! Ive tried StarryAI, imagecreator, wepik, cant afford midjourney or paid ones right now, some others I cant remember with no mixing... Artbreeder's interface, you can keep adding and it will mix them together. I made these images and others like them very easy in Artbreeder, but its plan is very limited - I could buy more credits, but I need to wait a few days (new job, not paid yet, broke today... lol): ​ morph between man and donkey Morph between angry rapper and gorilla SO, if anyone can suggest some free, or almost free (generous newbie credits?) that can do mixes like this - please point me in the right direction. submitted by /u/magusat999 [link] [comments]
    New York wants to be AI's world capital, in rivalry with San Francisco and Silicon Valley
    submitted by /u/norcalnatv [link] [comments]
    Could an AI-created profile picture help you get a job?
    Artificial intelligence (AI) is being used to create professional-looking profile pictures for job hunting websites like LinkedIn. Apps like Remini, Try It On AI, and AI Suit Up use AI-based software to generate slick profile photos that mimic the work of expert photographers. Users upload multiple selfies, and the AI software creates artificial photos with different hairstyles, clothing, and backdrops. While some find the results realistic, others think they look artificial. The AI services are popular because they are cheap or free, making them accessible to those who can't afford professional headshots. However, opinions are divided on whether AI-generated photos are beneficial or detrimental to self-esteem. Some believe that AI-generated photos allow individuals to put their best self forward and potentially increase their chances of being considered for opportunities. Others worry that relying on AI-generated photos may negatively impact self-worth and confidence. Recruiters generally do not consider whether a photo is AI-generated when evaluating job applications. Source : https://www.bbc.co.uk/news/business-67054382 submitted by /u/NuseAI [link] [comments]
    AI Tool for film footage notes
    Hi, im currently filming a documentary, but I’m so busy filming, i don’t have time to write notes on footage for the editor. Does anyone know of any ai tool that can help with this and save time and streamline this process? King regards submitted by /u/Brand0n_C [link] [comments]
    How AI will affect traditional and open source software industry?
    Hey folks, how would you guys see the effect of AI? Will the small softwares companies will go bankrupt? Since the lots of software are using tools like ChatGpt, Midway Journey etc. It just the starting of new AI technology era which will evolved over the years. In that time we will see more and more AI software which will likely provide efficient and better solution as compare to traditional and open source software. So my question is how do you guys see this? Will small software companies or open source software programs days are number? submitted by /u/Haziq12345 [link] [comments]
    One-Minute Daily AI News 10/11/2023
    Opera has launched Opera One — a new version of the browser that comes packaged with an AI-powered chatbot called Aria.[1] Adobe is going all in on AI, announcing three new generative AI models today that add powerful features to Illustrator and Adobe Express and vastly improve Photoshop’s text-to-image capabilities.[2] ‘South Park’ to Tackle AI for Next Event Special, Releases Teaser.[3] World’s first AI tutor launched in Australia to help students get through their exams.[4] Sources: [1] https://www.theverge.com/2023/6/21/23768888/opera-one-browser-aria-ai-assistant-chatbot [2] https://www.theverge.com/2023/10/10/23911114/adobe-max-firefly-generative-ai-model-photoshop-illustrator-express [3] https://www.hollywoodreporter.com/tv/tv-news/south-park-ai-joining-panderverse-1235615276/ [4] https://www.techguide.com.au/news/computers-news/worlds-first-ai-tutor-launched-in-australia-to-help-students-get-through-their-exams/ submitted by /u/Excellent-Target-847 [link] [comments]
    Cypher 2023: The Future of Simulation and Design is AI
    submitted by /u/Agitated-Spell3979 [link] [comments]
    Any ideas how this was created?
    submitted by /u/crispyTacoTrain [link] [comments]
    Web design tools
    I’m looking for input and advice on tools for web designers. I use Wordpress a lot, Magento some and frequently code by hand in html JavaScript and PHP. I know there are some AI tools out there now but I don’t know which are best and wanted to find out what people thoughts are on this subject. What tools are you using, for what, and why? Thanks! submitted by /u/PowerTarget [link] [comments]
  • Open

    [R] Researchers Identify Emergent Linear Structures in How LLMs Represent Truth
    LLMs' tendency to make up false statements (hallucinate) is a major concern. We need ways to inspect whether they really "know" something is true or not so we can reduce hallucinations. In a new paper, researchers found that LLMs contain an internal "truth vector" - an emergent linear structure that represents factual truth values. They had the insight to visualize how GPT represents simple true/false sentences. The true ones clustered together, while false ones clustered elsewhere - suggesting some kind of 'truth direction' in its learned representations. To test this, they trained linear "probes" on one dataset, and found they could generalize to accurately detect truth values in totally different datasets about other topics. They also directly modified the models to add or subtract the identified truth vectors from its processing of statements. This could flip assessments of truth value, showing the vector causally influences reasoning. Together, these findings provide evidence that neural networks can create emergent, linear structures that represent factual truth. This finding could eventually help make AI systems less prone to hallucinations and falsehoods. TLDR: LLMs can create emergent linear representations of truth. This sheds light on how AI represents abstract concepts and could help us reduce hallucinations. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Recommendations request for a guide to research publication
    I am working on a research topic in Data Engineering. Forgive me if this is a question frequently asked, I couldn't find this specifically in the FAQ. What are good publication tips and journals to publish in? I read through a few journals and all of them are big publications. What if I opt fot some upcoming or other niche (maybe data engineering) journals submitted by /u/Sherbhy [link] [comments]  ( 9 min )
    [R] SWE-bench: Can Language Models Resolve Real-world GitHub issues?
    We have a new benchmark out called SWE-bench (arxiv) It challenges LMs to solve real GitHub issues (feature requests & bug reports) from popular Python repos. Answers are validated using unit tests we crawled from those repos. The benchmark at swebench.com/ shows that even the strongest models, such as Claude 2 and GPT-4, get less than 5% accuracy. ​ We are here to answer any questions you may have. submitted by /u/ofirpress [link] [comments]  ( 9 min )
    [D] Sample probability diffusion models
    I would like to understand how I can calculate the probability that a sample belongs to the distribution a diffusion model was trained on. Say, I have an image of a car, and I would like to know whether this image belongs to the distribution that is estimated by the diffusion model. So I would like to know the probability between zero and one at the car belongs to this distribution Do you know how I technically can do this? submitted by /u/That_Phone6702 [link] [comments]  ( 9 min )
    [Discussion] Making a Tutorial for Using a New Platform for ML in the climate and earth science space
    Hey guys Looking for some ideas. I'm building out a jupyter book that will be a tutorial on how to use a research platform for data analysis and modelling. My PI has given me free liberty over it. I can not think of a good idea to do the analysis and build the model on. It does not need to be complex but should be good enough so that any researcher, student or organization using the platform can get a good idea of how to use it for ML. Any thoughts on a good area to look into? Any recommendations? Note this will be a tutorial and as such an overly complex model is unnecessary. I just can not figure out what to look into so hoping you guys could give thoughts about possible areas in climate, weather and earth science that I could focus on for the tutorial in the jupyter book. submitted by /u/AdditionalFun3 [link] [comments]  ( 9 min )
    [D] Submitting a paper rejected by EMNLP to ARR
    First time submitting to ARR here. I was quite confused about this paper resubmission thing. I got rejected by EMNLP (submission directly to EMNLP with openreview) a week ago and I am planning to resubmit it to the ARR system (also using openreview). Does this EMNLP submission count as a previous ARR submission that should be mentioned or not? Do I need to withdraw the paper from EMNLP openreview prior to submitting it to ARR openreview? submitted by /u/Icy-Distribution6887 [link] [comments]  ( 9 min )
    [D] [P] UI-based AI agents: UI-Act
    Hi! Happy to share a project I've been working on for a while: UI-Act https://github.com/TobiasNorlund/UI-Act It's an AI model architecture designed to autonomously navigate and interact with computers using the graphical user interface. Think of it as a co-pilot that "sees" your screen and acts on it, just as a human would. In essence, it's a custom transformer model taking prompt and screenshots as input, with output heads to predict low-level actions i.e. mouse clicks. In the demo, it has been trained to compute simple expressions in a calculator window, using expert demonstrations/behavior cloning. If scaled up appropriately however, it could provide a basis for a general agent to automate arbitrary tasks on a computer. I would be interested in hearing your thoughts on it, and especially with regards to the trend towards general AI agents and assistants (Windows Copilot / Adept ACT-1 / AutoGPT etc). LMs equipped with e.g. function-calling is a trendy approach, that rely on text-based state representations and APIs to take action. In cases where this is unfeasible, UI-based agents might provide a more general alternative. As the agent's interface to the computer is shared with humans, it can be easily taught using expert demonstrations, and require little or no technical expertice. Let me know what you think! submitted by /u/tobibbelfuel [link] [comments]  ( 9 min )
    [P] Learn how to make trustworthy and transparent machine learning models in Tsetlin Machine Book Chapter 7: Confidence, Trustworthiness, and Composites.
    ​ Confidence and trustworthiness of Tsetlin Machines. Hi all! Just completed a new chapter in the book An Introduction to Tsetlin Machines: https://tsetlinmachine.org Happy to receive feedback! Abstract: Collaboration can be essential to manage complex projects. One example is building a house. You then need the expertise of carpenters, plumbers, and electricians. Each profession brings unique skills to the table. Similarly, different types of Tsetlin machines can have distinct capabilities. In this chapter, you learn how Tsetlin machines can team up, allowing them to achieve more than they could on their own. The effectiveness of a team relies on recognising each member's strengths and limitations. Appreciating where your expertise stops and where your coworkers' expertise begins is crucial for effective collaboration. We first explore how Tsetlin machines can assess their competence in Section 7.1. Using the vote count from Chapter 1, you learn to measure how confident a Tsetlin machine is when it makes its decisions. It is possible to be highly confident and still perform poorly. To be trustworthy, confidence must be in line with one's capabilities. Therefore, Section 7.1 also covers how to evaluate trustworthiness. Next, in Section 7.2, you discover how to build a team of Tsetlin machines with different skills. By assessing each Tsetlin machine's confidence, you can lean on the confident ones when making decisions. The result is a Tsetlin machine composite - a construction where multiple Tsetlin machines join forces. You can think of it as a composite material, such as epoxy, which reinforces resin with fibres, making it strong, lightweight, and durable. submitted by /u/olegranmo [link] [comments]  ( 9 min )
    [R] [D] Need Peer Review: Unsupervised Learning for Student Dropout Anomaly Detection
    Hello all, Just wrapped up Task 1.1 for anomaly detection in student dropout rates. Keen for some extra eyes on it. Task Highlights: Data Pre-processing & Normalisation K-Means Clustering Gaussian Anomaly Detector Used PCA for dimensionality reduction Links to the following files: data.csv Task 1.1 - Rubric.pdf Task1.1Script.ipynb https://drive.google.com/drive/folders/17XcjEoYCrDWqf90VVNdkLAkYNdtWWwGu?usp=sharing Would greatly appreciate any feedback! Cheers! submitted by /u/Nook31 [link] [comments]
    [R] A method to assess trustworthiness of machine coding at scale
    submitted by /u/mnky9800n [link] [comments]  ( 8 min )
    [P] [vilays] Prototype Video Demo - Any Feedback from ML Engineers?
    Hi everyone, I’m thrilled to share a prototype we've been tirelessly working on. We are developing a virtualization environment for applications, specifically tailored to engineers, designers, data scientists, and researchers. In a nutshell, our platform enables users to run cloud-hosted desktop apps from any device, making it appear as if the applications are installed on their local machines, while they're actually operating on a remote server. The ultimate goal is to obliterate barriers between local and cloud execution, especially for compute-intensive workloads, thereby allowing seamless usage of High-Performance Computing software on the cloud with the scalability to adjust computing resources as per necessity. We’re here to solicit your invaluable feedback on our product video demo. Your insights will not only help us identify any blind spots and enhance our solution but also better understand the needs and preferences of our potential user base. 📽 [https://youtu.be/QR8FWRnPrXM?feature=shared] We're eagerly awaiting your thoughts and appreciate you taking the time to help us refine our product! Thank you! :) submitted by /u/aaron-cesaro [link] [comments]
    [D] Databricks Dolly 15k - Creating Synthetic Variants
    Hey all, I found Dolly to be a very interesting project when it was released but I'm curious if it has similar value today because a lot of synthetic data generation options seem to be popping up. Now it seems like Dolly is human generated/curated by over 5k employees (which is great), but wouldn't it be a better approach now to have Llama70b (or maybe Falcon) just generate future variants of 15k rows? I havent been able to figure out why we arent seeing more synthetic datasets like this on HF? Is the bottleneck licensing, compute or just incentive? Heres the original Dolly post thread: https://www.reddit.com/r/MachineLearning/comments/120usfk/r_hello_dolly_democratizing_the_magic_of_chatgpt/ submitted by /u/buzzyness [link] [comments]
    [D] Please suggest a Loss function for image to image task.
    What is the loss function that needs to be used for a task that takes an input image with a lot of haze and produces an image with reduced haze. The architecture is a simple encoder decoder architecture. I tried MSE as some articles and ML guides say that MSE is good for pixel wise comparison and also tried Categorical Crossentropy but none of them work so great. MSE works but produces artefacts like red/green/ blue spots and spatters and at worse times it produces a white image. The research on this task includes use of SIDNet[Single Image Dehazing Net], Transmission maps, Dark channel prior algorithm, FFA net, etc trained on the Benchmark datasets (RESIDE,SOTS). I aim to create a simple architecture for college project so I chose the Enc-Dec architecture. Any suggestions are appreciated. submitted by /u/Wild_Basil_2396 [link] [comments]
    [D] Startup team demonstrates differentiable Swift compiler outrunning TensorFlow by 322X
    Autonomous systems startup, PassiveLogic, assembled a differentiable computing team, to build a fast systems language with native performance differentiability. Their latest benchmark trains networks two orders of magnitude faster than PyTorch and Tensorflow. See: LinkedIn Post&dashCommentUrn=urn%3Ali%3Afsd_comment%3A(7118052434916110337%2Curn%3Ali%3Aactivity%3A7117911978106355712)) It's a collaborative effort with the Swift community and Apple's compiler team, using the Swift language as a strongly typed embedded language that performs ahead of time compilation of graph neural nets. The focus is on fusing systems programming and AI engineering into a single native high performance language, to enable typed heterogeneous inference and training. The compiler development is open sourced as part of the standard Swift package. Try it yourself at swift.org. submitted by /u/taharvey [link] [comments]  ( 9 min )
    [D] How is test-driven development implemented in the context of machine learning?
    I recently tried to refactor a previous project that I had, but I realized that after making all of the changes the performance wasn't reproducible anymore. I decided to start from scratch, make incremental changes, and make sure that the model's performance is maintained with each change. Very basic in hindsight, but I guess I was too hasty with coding. Anyway, running the full model's training and evaluation with each change is proving to take too long. I'm curious if there's any other way that people implement TDD in the context of machine learning since projects/applications tend to be more time consuming then typical applications. submitted by /u/Seankala [link] [comments]
  • Open

    Developing industrial use cases for physical simulation on future error-corrected quantum computers
    Posted by Nicholas Rubin, Senior Research Scientist, and Ryan Babbush, Head of Quantum Algorithms, Quantum AI Team If you’ve paid attention to the quantum computing space, you’ve heard the claim that in the future, quantum computers will solve certain problems exponentially more efficiently than classical computers can. They have the potential to transform many industries, from pharmaceuticals to energy. For the most part, these claims have rested on arguments about the asymptotic scaling of algorithms as the problem size approaches infinity, but this tells us very little about the practical performance of quantum computers for finite-sized problems. We want to be more concrete: Exactly which problems are quantum computers more suited to tackle than their classical counterparts, an…  ( 94 min )
  • Open

    UK Tech Festival Showcases Startups Using AI for Creative Industries
    At one of the U.K.’s largest technology festivals, top enterprises and startups are this week highlighting their latest innovations, hosting workshops and celebrating the growing tech ecosystem based in the country’s southwest. The Bristol Technology Festival today showcased the work of nine startups that recently participated in a challenge hosted by Digital Catapult — the Read article >  ( 6 min )
    Get in Gear: ‘Forza Motorsport’ Races Onto GeForce NOW
    Put the pedal to the metal this GFN Thursday as Forza Motorsport leads 23 new games in the cloud. Plus, Acer’s Predator Connect 6E is the newest addition to the GeForce NOW Recommended program, with easy cloud gaming quality-of-service (QoS) settings built in to give Ultimate members the best streaming experience. No Breaks, No Limits, Read article >  ( 6 min )
  • Open

    DeepMind 2022 'full accounts' financial report: 2022 budget: £1,081 million ($1.3b) (decreased by a fifth from 2021)
    submitted by /u/gwern [link] [comments]
    RL for non-Python environments?
    Most real world applications for RL (robotics, game dev, finance) are in not normally done in Python, yet all major RL frameworks are written in Python. Is there a good/high-performance cross-language framework to do RL in other languages like C++/.Net/Java? If not, do you think people would be interested in such a framework? ​ submitted by /u/xor24 [link] [comments]
    Reinforcement learning agents that adhere to a causal model of the problem
    Do you know any work that tries to develop RL agents that exploit some sort of high-level model of the problem (it could also be given by an expert human) to learn faster or operate on out-of-distribution scenarios? I'm particularly interested in Causal Models, but any similar thing could be interesting for me submitted by /u/fedetask [link] [comments]
    What is the intuitive explanation for using log probabilities in Policy gradient methods instead of simple probabilities? does it improve gradient descent optimization ?
    submitted by /u/aabra__ka__daabra [link] [comments]
    Why does Drq-v2 sample from replay by episode then experience?
    I've been looking at DrQ-v2 (https://github.com/facebookresearch/drqv2) recently and it samples from replay in a way that seems odd to me but may have a purpose I don't understand. They store experiences in a compressed file by episode, this makes some sense since it means they don't have to store everything in RAM and they delay disk writes until the end of the episode so they don't slow down the sim operation. On sampling, they randomly select an episode then randomly select an experience from the episode, calculating the n-step reward dynamically at sample time instead of at experience storage time. This is then fed to the model by a pytorch DataLoader. This means a _lot_ of disk reads during the optimization step which can't be ideal but I'll put that aside. What is the advantage of doing this selection by episode? It may give a better spread across episodes in each update, but I'm not sure that makes up for the potential downsides of making prioritization and other replay tricks much harder. Any ideas? submitted by /u/EDMismyO2 [link] [comments]
    Can reinforcement learning models learn to rank?
    I have a very simple observation: a list of random value state = [random.uniform(-0.2, 0.2) for _ in range(200)] reward = state * actions . The reward is not using the next state, it's using the previous state i gave to the model. So basically i already give the answer to the model, the best action is : if state > 0 action =1, if state < 0 action = -1 I tried using PPO, but it seem not learning at all. My test_env.py is here: ``` import gymnasium as gym import numpy as np from gymnasium import spaces from gymnasium.utils import seeding from stable_baselines3.common.vec_env import DummyVecEnv import random class TestEnv(gym.Env): metadata = {"render.modes": ["human"]} def __init__( self, item_count, test_steps, is_train = True, ): self.is_train = is_train self.test_steps = test_step…
  • Open

    Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets
    These research papers were presented at the IEEE Symposium on Visual Languages and Human-Centric Computing (opens in new tab) (VL/HCC 2023), a premier forum for design, theory, and application of computing technologies for programming, modelling, and communication. Large language models (LLMs) have revolutionized the way novice programmers and everyday computer users tap into the capabilities […] The post Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets appeared first on Microsoft Research.  ( 10 min )
  • Open

    Homework problems are rigged
    This post is a follow-on to a discussion that started on Twitter yesterday. This tweet must have resonated with a lot of people because it’s had over 230,000 views so far. You almost have to study advanced math to solve basic math problems. Sometimes a high school student can solve a real world problem that […] Homework problems are rigged first appeared on John D. Cook.  ( 7 min )
  • Open

    12 Generative AI Trends to Watch Out for
    The advent of generative AI is empowering everyone alike – organizations, small businesses, individuals, students, and medical professionals, to name a few. The last couple of years have been revolutionary for artificial intelligence innovation and transformation. How will 2024 shape up for AI, AI tools, and related professionals? Let’s analyze the trends that are most… Read More »12 Generative AI Trends to Watch Out for The post 12 Generative AI Trends to Watch Out for appeared first on Data Science Central.  ( 20 min )

  • Open

    Predictive AI analyzing attraction to facial features (iris Dating app)
    Top dating apps Tinder, Hinge and Bumble have all stated that they're already investing in AI to make their apps better. They're using it to verify profiles, match people based on bios and interests, and help generate profile descriptions and liven conversations. But what about machine learning on user photos? iris Dating uses AI to analyze user input in the form of liking or disliking faces ("swiping" profiles). We all know if we like blondes or brunettes, blue or brown eyes, short or long hair, beard or no beard, etc. But AI can pick up the subtlest features (proportions, distances, curvatures etc.) and build a face map. A matrix of features, if you will. It doesn't just look for a person looking like your favorite celebrity crush. It understands what you're really attracted to. From there it's an easy path: if it knows which features attract me, it can predict my level of attraction to a specific individual (specifically, their face). Find the persons with the highest predicted attractiveness (for me, not for everyone), rank them by attraction for me, and we have a potential high mutual attraction match. The two stats I have are that on average women like 55%(!) of the profiles iris picks for them; and that users have 40x higher chances of matching when they've trained the model to understand their taste. I know it takes a lot more than a pretty face to make for a great relationship, but it sure doesn't hurt to start with strong physical attraction. Missed connections on Craigslist are about just that: seeing a face you can't forget. Find me more of these "wow" faces and let's go from there. What do you think? Is it too early? Too bold? Too niche? submitted by /u/akahamlet [link] [comments]
    Superman if portrayed by different actors (as imagined by AI)
    submitted by /u/fat_n_stupid [link] [comments]
    DALL·E 3 is blocking copyrighted material. Also DALL·E 3:
    submitted by /u/Zimmax [link] [comments]
    The AI research job market shit show
    The AI research job market is going through a shakeup, with a high demand for skilled researchers and a scarcity of talent. Companies closely monitor the movements of researchers as an indicator of their ability to transition from concept to product. The market is highly competitive, with researchers being offered high salaries and compensation packages. This has led to high turnover and attrition in many companies, causing unsettledness among employees. Despite the challenges, the investment in AI research is expected to drive innovation and push the boundaries of the Transformer architecture. Source : https://www.interconnects.ai/p/ai-research-job-market submitted by /u/NuseAI [link] [comments]
    Are there any low res (pixel art) art tools?
    I'm looking for ways to create art for a game I'm creating. submitted by /u/Yenii_3025 [link] [comments]
    Inverting Transformers Significantly Improves Time Series Forecasting
    Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting. The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems: Variables recorded at slightly different times get blurred together, losing important timing info Each token can only see a single moment, no long-term dependencies So Transformers struggle to extract useful patterns and correlations from the data. Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data. Their "Inverted Transformer" (or iTransformer): Makes each variable's full history into a token, instead of each timestamp Uses self-attention over variables to capture relationships Processes time dependencies per variable with feedforward layers This simple tweak gives all the benefits we want: State-of-the-art forecasting accuracy, beating both linear models and standard Transformers Better generalization to unseen variables Increased interpretability Ability to leverage longer historical context TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Best ChatGPT Plugins: Ultimate List for 2023
    submitted by /u/Senior_tasteey [link] [comments]
    The NSFW dream (truely unrestricted ai desires)
    I guess I'm looking for the impossible but does anyone know of a generator that has all of the following qualities in order of importance least to most important: Has a massive variety of styles like Womba's private discord server does. "Create variants" function like how a Womba discord personal server generator allows you to do. Generates beautiful "digital art" style images like the digital https://www.unstability.ai/ does. (Man those images are pretty) faces are really good most of the time. (It's frusterating as it looks so good but I can't seem to get any group sex poses going on.) Provides a variety of poses such as https://easywithai.com/ai-image-generators/promptchan-ai/ which also allows you to upload you own images for poses, like how I could upload a real life orgy image and as long as it could distinguish the bodies as being separate (not a big pile of limbs) it does pretty good, but lacks severely lacks in facial quality. Like a big booty girl in hyperreal style 1080P or higher resolution. (Again Womba is good here, but they are just extreme on their restrictions.) 1080P should be the minimum for any paid service as how can we truely enjoy a full screne image on anything less without it pixeling out? Doesn't cost $150/month (yes I found one that does all this but their premium subscription cost like $150/month (seduced.ai) and it's not even unlimited. I paid $90 for a full year at Womba discord unlimited but again, $150/month is just not worth it. If anyone knows of a server that has all these for around $25/month or less, please let me know. If really appreciate it. submitted by /u/russader [link] [comments]
    Can AI reference both photos to make the black and white photo the same as the colour image?
    I have a high resolution black and white print and a generic quality colour image of the same photo, that I'd like AI to look at both images and make the B&W into colour. Is this possible? submitted by /u/NikonD3X1985 [link] [comments]
    AI Morality Scenarios.
    submitted by /u/Philipp [link] [comments]
    One-Minute Daily AI News 10/10/2023
    Cybersecurity firm Avast is calling out a long-lived tool “LoveGPT,” that has haunted popular dating apps and that has been upgraded with artificial intelligence, gaining the ability to build fake profiles and manipulate unsuspecting users.[1] The outsider told the WSJ that Microsoft used AI from its partner OpenAI, which was then used to launch GitHub Copilot at $10 per month, but lost $20 per user in the average six months on average in the first 2023. Some Copilot users cost as much as $80 per month.[2] SK Telecom said on Monday that it successfully wrapped up its international AI competition of 226 teams, “Prompter Day Seoul 2023,” held in partnership with OpenAI.[3] Google DeepMind Researchers Introduce Promptbreeder: A Self-Referential and Self-Improving AI System that can Automatically Evolve Effective Domain-Specific Prompts in a Given Domain.[4] Sources: [1] https://decrypt.co/200787/lovegpt-ai-dating-apps-catfishing-hack-avast [2] https://game-news24.com/2023/10/10/microsoft-lost-20-for-every-10-copilot-ai-subscription-report-45-for-every-10-copilot-ai/ [3] https://asianews.network/skt-openai-hold-ai-competition-for-social-good/ [4] https://www.marktechpost.com/2023/10/08/google-deepmind-researchers-introduce-promptbreeder-a-self-referential-and-self-improving-ai-system-that-can-automatically-evolve-effective-domain-specific-prompts-in-a-given-domain/ submitted by /u/Excellent-Target-847 [link] [comments]
    I finally have enough ai tools and here is my complete list
    VIDEO EDITING InVideo CapCut Filmora Veed io Rotor KEYWORD RESEARCH VidiQ Summarized YT Summary CONTENT CREATION Explore Al Vidds Opus Descript Lumen5 Steve Al AUDIENCE ENGAGEMENT ManyChat TubeBuddy Canva Hootsuite ANALYTICS Vidyo Nova Al Daily Life Tools Taskade TLVD Bardeen Al Vondy Al Notion Al Chatbots Tools YatterPlus Typewise Quickchat Cohere Kaizan Coding Tools Durable Al 10Web Akkia Replit Deepcode Design Tools Flair Al Autodraw StockIMG Booth Al Clipdrop Content Creation Tools Writesonic Beautiful Al Tome Al ChatABC Steve Al Music Tools Boomy Amper Jukedeck Melodrive BrainFM Writing Tools AISEO Quillbot Writesonic Bertha Al Simplified Youtube Tools Eightify Thumbly Steve Al ClipMaker TubeBuddy Twitter Tools Tweetmonk Tribescaler Postwise Tweetlify Tweethunter Sales Tools Lavender Warmer Regie Twain Octane Marketing Tools simplified ContentEdge Copt Smith Copy Al Mutiny Research Tools Consensus Paperpal Trinka Writesonic scholarcy I'm just sharing my experiences and observations in the field of ai. LIST AND SITE submitted by /u/PerceptionPlayful469 [link] [comments]
    Write Your Next Book with These Awesome ChatGPT Prompts
    Awesome ChatGPT Prompts submitted by /u/Senior_tasteey [link] [comments]
  • Open

    [D] how to download datasets from huggingface
    Hello, first time using Google Colab and huggingface datasets. Colab notebook is easy to setup but I can't seem to figure out how to download datasets from huggingface. I am trying to download https://huggingface.co/datasets/kili-technology/plastic_in_river dataset in Colab Notebook. After reading some beginners forums, I modified the example to look like one below but it failed. from datasets import load_dataset data_files = {"train": "train.csv", "test": "test.csv", "validation": "validation.csv"} dataset = load_dataset("kili-technology/plastic_in_river", data_files=data_files) Because there's no path to the files to be downloaded. Can someone explain how to download datasets from huggingface please? Downloading builder script: 100% 3.25k/3.25k [00:00 in () 2 3 data_files = {"train": "train.csv", "test": "test.csv", "validation": "validation.csv"} ----> 4 dataset = load_dataset("kili-technology/plastic_in_river", data_files=data_files) 5 frames /usr/local/lib/python3.10/dist-packages/datasets/data_files.py in resolve_pattern(pattern, base_path, allowed_extensions, download_config) 366 if allowed_extensions is not None: 367 error_msg += f" with any supported extension {list(allowed_extensions)}" --> 368 raise FileNotFoundError(error_msg) 369 return out 370 FileNotFoundError: Unable to find 'https://huggingface.co/datasets/kili-technology/plastic_in_river/resolve/main/train.csv' submitted by /u/0ni0nrings [link] [comments]  ( 9 min )
    [D] How do byte-level language models work?
    I've recently been trying to pre-train my own small language model on the tiny-series datasets on huggingface. I also wanted to use a model similar to MEGABYTE but I don't understand how using bytes would work. The only implementation I could find from lucidrains used str(chr(max(32, token))) to decode any token (byte) to a character and put the embedding size as 256. Firstly, why 256 and not 256-32 as any values below 32 are ignored? Also, many byte-level models including this and ByteT5 mention that they can process any text sequence even in a multilingual setting, however how would that be true if we are only using one byte, would we have to move to 2 bytes or use an UNK token, and if we did use 2 bytes that would make our embedding size around 65000 which defeats sort of the point as o…  ( 10 min )
    [P] Evaluating and tuning a model when the population may change YoY and best practices for mitigating overfitting on features that correlate with time.
    Consider a predictive model that is predicting if an outcome Y will occur in Q1 2023, based on data from Q1 2022. Now, if want to predict outcomes for 2024, we must use last years data to build the model, but we are going to have some bias if there are features that vary year over year. Is the best approach in such a situation to try and tune/validate the model with other years in the hopes of mitigating any features that are correlated with a specific year? Any help would be much appreciated, as I can't find agreed upon methods. submitted by /u/unga123 [link] [comments]  ( 9 min )
    Is there a model to input anecdotal text stories as training data to return a more comprehensive story? [P]
    I have a goal and am looking for direction from others who know more than me about machine learning. I want to submit 5-10 pieces of text to a model. The text will be anecdotes from a common experience but each one from a different person’s perspective. For example, if a family visits a theme park, each family member will have a story or two about the day. Each family’s story would be a submission to the model. One person might have loved the roller coaster and can tell about the exciting parts. Another person maybe just can’t stop talking about how great he food was. Someone else maybe felt sick and complains the line at the bathroom was too long. Perhaps another family member also rode the same roller coasters as the first person but instead hated it, so would have a very different description of it than the first. All these anecdotes are submitted to the model. Then, the model can be queried. Such as, “Tell me about the theme park.” or “I love roller coasters. Tell me about the theme park.” or “I tend to overeat, tell me about the theme park.” (the model wouldn’t hype of the food, maybe it would talk about how much exercise the visitors get by walking around all day.) In this case of a theme park context, the model would have a preconception of a theme park. It would know the general concept, know of several examples or standards that it could compare this theme park against, understand it’s all for fun, etc. This type of model may be available as an API or model already and I just don’t know about it. That’d be fine, please point me towards it. Or, maybe there’s something already available but would need tweaked or customized. submitted by /u/Semper_Disco [link] [comments]  ( 10 min )
    [D] Help me learn ML easily specially in model building and EDA
    Can you give easy to understand sources and hands-on practice methodology to master ML? Help me understand build the models in and out . Thank you submitted by /u/the_mystic_1 [link] [comments]  ( 9 min )
    NSF workshop on LLMs in chemistry education [R]
    Over Feb 12-13 of 2024, the National Science Foundation (NSF) is sponsoring a workshop titled “Integrating LLMs into the Materials Chemistry Curriculum” in Golden, Colorado. We aim to explore and develop innovative ways to incorporate large language models (LLMs, e.g. GPT, ChatGPT, and Bard) into upper division chemistry laboratories and virtual lab experiences. During the workshop, participants will brainstorm and create demonstrations incorporating LLMs into the curriculum. The event will bring together folks across academia and the private sector with disciplinary backgrounds that range across chemistry, computer science, materials science, physics, and education. There is no registration fee, and we anticipate being able to cover the majority of participant travel costs thanks to NSF support. Participants early in their career (i.e., graduate students, postdoctoral scholars) are particularly encouraged to apply. If you are interested in participating in this workshop, please fill out the Google form (link below). Please feel free to distribute this invitation widely. Application: https://forms.gle/P9QdNiCuaUAHFZj29 submitted by /u/KC2792 [link] [comments]  ( 9 min )
    [P] Where to find projects to contribute to?
    Hello, I'm a developer with 6 years of experience in the mobile field, and I recently completed my master's degree in artificial intelligence (Text mining). I want to transition into the field of AI, but I need more experience with projects in the "real world," outside of academia, and I'd like to contribute to an open-source project. I looked on Github, but I ended up feeling confused and not sure where to start. P.S.: I did some research in this subreddit, but the posts about contributions seemed a bit dated. submitted by /u/Substantial_Fact_205 [link] [comments]  ( 9 min )
    [P] Image based Python + OpenCV automation, MMORPG Laghaim Auto-Fighter Bot Demo
    Video: https://youtu.be/0m12vkaoE7w ​ Detailed Medium post will follow in the upcoming days. https://medium.com/@pssdplayer submitted by /u/HistorianCrafty3514 [link] [comments]  ( 9 min )
    [D] - I have 20-30 million shopify products dataset, any ideas?
    I have collected over 20 million shopify products & had the following ideas for them: - LLM ( Finetune an llm to know how to speak ecom ) - Video bot that can make videos on those products, using their description, elevenlabs & AIFaceGen - EcomStore that will markup the products about 30% ( This will need the bot to frequently scrape, to ensure that the products are up to date ) - Selling the dataset based on fragments, like 1$ per 1k-10k records, depends on what sells. Please let me know if these are good ideas, and if someone would like to support / help me in any way ( I just need to selfhost my supabase instance, & add all the products to it & then dev can get started ) submitted by /u/AdonisCodes [link] [comments]  ( 9 min )
    [D] Best open-source AI model for QA generation from context
    As the title says I’m looking for an open-source AI model for generating question-and-answers with a correct answer option and explanation to the correct answer from the input context. So far I have tried these models, TheBloke/Llama-2-7B-GPTQ TheBloke/Llama-2-13B-GPTQ TheBloke/Llama-2-7b-Chat-GPTQ (the output is not consistent. Sometimes I get an empty response or without the correct answer option and an explanation data) TheBloke/Llama-2-13b-Chat-GPTQ (even 7b is better) TheBloke/Mistral-7B-Instruct-v0.1-GGUF(so far this is the only one that gives the output consistently. But not able to generate more than 2 QA due to max token limit of 512. Even tried setting the max token as 1024, 2048 but nothing helped) TheBloke/Mistral-7B-OpenOrca-GGUF NousResearch/Llama-2-7b-chat-hf My system configurations are: Windows 10 with 16GB GPU Additional Information: The input prompt token will be around 250-350 tokens per request. submitted by /u/gokulcv [link] [comments]  ( 9 min )
    Churn Prediction [R]
    I want to build a model to predict churn in a third party logistics company. What variables should make up my data? Any help would do. Thanks submitted by /u/DisastrousAd8814 [link] [comments]  ( 9 min )
    [D] Recommendations for CPU-Based Real-Time Vector Database Indexing and Matching?
    Hello everyone, I have a specific online vectorization use case: I'm looking to search the internet for articles, vectorize these articles along with the search queries, and then retrieve the most relevant passages from them. Currently, I have basic hosting through DigitalOcean. Could anyone recommend the most suitable vector dataset for this task? Additionally, considering my resources, is it feasible to run this system solely on CPUs? And if so, would this setup be scalable if deployed on CPUs only? submitted by /u/Traditional-Poet2746 [link] [comments]  ( 9 min )
    [R] network digital twin for cybersecurity
    Hi all, for a text work of mine I am trying to do a project based on generating digital twin of networks. My goal is to create a digital twin of a network and then work on it from a cyber security point of view. I will briefly explain what I would like to do. I am currently using software for network vulnerability scans (OpenVAS). I use this software to perform network vulnerability scans at the network level, so basically to OpenVAS I pass a network (for example 192.168.xx.xx/24) to automatically identify all the vulnerabilities that are there. The next step ( what I'd like to do and that's why I'm asking for your advice) is to create a digital twin of the newly scanned network and then perform a penetration test on this digital twin of the network, without going to stress the actual network. Ideally, I would like to pass the output of the OpenVAS vulnerability scans, routing rules, and firewall rules to some tool that will then generate for me the digital twin of the network, which will then be used for offensive cybersecurity, so exploits, privilege escalation, etc.... will be tested on this digital twin without worrying about breaking some kind of service or stressing the real network. What I am asking is, do you know of any tool that would do the trick for me? So some tool that allows me to generate a digital twin of a network by providing as input vulnerability scans (xml,json,csv etc...), routing rules, firewall rules, pcap traces etc... Do you have any references or documentation? Are you aware of any open source tools? I thank you for your helpfulness! ​ submitted by /u/Salt-Arugula-8128 [link] [comments]  ( 9 min )
    Best approach for VFX lineups using ML [Project]
    Quick intro Lineups are one of the first steps in the VFX pipeline Source: - orignal footage that was shot on set - a reference (quicktime) video from the film edit. Task: The reference shows modifications to the original footage. They can be : - timewarp (either fixed retimes like 200% speed or completely random) - transform (moved the image in x/y axis, rotation, scale, etc.) So the lineup task is to align the original footage to the reference quicktime. What I did so Far: Made a simple script in the software Nuke, using some Python and readily available tools to make it work on a simple shot. General logic is compare every frame and the associated one is the frame with the least difference between the two. This works on super simple and straightforward tasks. (can provide more info if needed). Issue: Some references are more heavily modified. They can have some muzzle flash, basic 3d objects or even some slight error introduced like a distortion applied to the image when none shouldn't so it will never be perfectly aligned. This makes the difference of the full frame higher for some frames, making the lineup wrong. (it will take the wrong frame that has no muzzle flash, because it has less difference...)Some other things to consider is that watermarks are covering the ref and the colors are not perfectly matching, can get them close enough, but there's a difference. Conclusion: Because of those issues, I'm thinking about using Machine Learning. I have next to no knowledge on the subject. I know there Is a bunch of ways to train a model, but no clue where to start, so here's my question : Which learning styles has the best potential to be able to solve this task? submitted by /u/Pretty_Customer_8113 [link] [comments]  ( 9 min )
    [R] What are some interesting research topics to study in the intersection of ML and signal processing currently?
    I will have to pick and start a research project next January for my final year. So wanted to start exploring now. I want to do something substantive and interesting enough to get published. submitted by /u/BadMeditator [link] [comments]  ( 9 min )
    [R] Mistral 7B
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [R] Tsinghua University: Inverting Transformers Significantly Improves Time Series Forecasting
    Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting. The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems: Variables recorded at slightly different times get blurred together, losing important timing info Each token can only see a single moment, no long-term dependencies So Transformers struggle to extract useful patterns and correlations from the data. Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data. Their "Inverted Transformer" (or iTransformer): Makes each variable's full history into a token, instead of each timestamp Uses self-attention over variables to capture relationships Processes time dependencies per variable with feedforward layers This simple tweak gives all the benefits we want: State-of-the-art forecasting accuracy, beating both linear models and standard Transformers Better generalization to unseen variables Increased interpretability Ability to leverage longer historical context TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] How to train multiple models on multiple GPU's simultaneously
    Hi! The task is to train N TensorFlow/Keras models using [2, ... N] GPU's on K different datasets in parallel. It is for testing a custom pipeline, you create a pipeline, you run it on multiple different datasets and get an aggregated metric. For now I'm using a for loop but how do I do it in parallel e.g. on AWS? I googled, but surprisingly haven't found a lot of results. I looked at Apache AirFlow because I'm vaguely familiar with it but so far I couldn't get a definite answer on how it works with multiple GPU's. Second option I found is to use Ray library. Is it worth trying? What should I use to solve this task? Thanks. UPD. I'd also consider a PyTorch solution as a backup option. UPDUPD. Jesus, why Reddit removing newlines after edit? submitted by /u/Disastrous_Sky9468 [link] [comments]  ( 9 min )
    [D] How important is having a great team when ML solutions are slow to be adopted ? When to move on?
    My team and managers are so easy to be with. Very grateful for that. The pay is okay. 150k/yr TC in Midwest. Hard for me to make a switch given how much I am appreciated. I almost feel spoiled when it comes to flexibility. I have overachiever tendency and the pace is so slow in adopting my ML models. I am the “lead”/senior data scientist in an R&D supporting scientists decision making with machine learning. Importantly, I am in a huge multinational consumer product company and I am not in the Data science organization, I bridge between the two and the data science expert on the team. I have developed the domain expertise and I have a PhD in an applied computational field with 5 years experience . I am not as challenged with getting deeper into complex stats, I have been really honing the soft skills of communication, influencing etc so getting comfortable in a senior role. Also I have been growing as a ML engineer building my own pipelines and deploying my models on prem server that they bought for me. I am not sure how greener it is on the other side, how do senior folks approach deciding when to move on? Any input is much appreciated. submitted by /u/Diligent_Trust2569 [link] [comments]  ( 9 min )
    [D] [P] [R] What to do when your model isn't testing well?
    I have 200k observations overall. I split my data into training and test set. My target variable has low prevalence ~ 9% so I tried random oversampling, random undersampling and SMOTE. After I fit my models, I tested them on my training test and the results were awful. I mean I've never had a model with 50% roc-auc, but then again, I rarely developed ML models. I'm wondering what the next steps would be? I understand there could be some sort of overfitting. But what would you do next? Any references would be appreciated :) submitted by /u/Actual-Muscle-9846 [link] [comments]  ( 9 min )
    [D] Fastest lipsync projects?
    Given an image, and an audio file (TTS generated), what is current fastest library that can output me a video of a talking image with the audio on it? I have made some research and I have seen Wav2Lip and SadTalker. Any better options? I am looking for processing speed and for the lesser hardware intensive solution for a side project. Thanks! submitted by /u/reddit2vid [link] [comments]  ( 9 min )
    [P] LoopQuest, A Github-like platform to host simulation environments for AI training
    Hello everyone! Here is my pet project, https://www.loopquest.ai/. I am trying to build a platform like Github to let people upload their simulation environments so people can train their AI agents by interacting with the environments created by others. Here is a 2-min demo, https://youtu.be/d53NFjkU7JA. It is not launched yet but would love to get some early feedbacks. Here is the corresponding Github repo https://github.com/LoopMind-AI/loopquest. For now, the package can log env-agent interaction data by adding one extra line of code. You can think of it similar to https://github.com/google-deepmind/envlogger but with much better backend and frontend support. Any feedbacks are appreciated :) submitted by /u/jxx123 [link] [comments]  ( 9 min )
    [D] Why async gradient update doesn’t get popular in LLM community?
    The pipedream-2bw paper and the Zero-offload paper both show that 1-step delayed asynchronous gradient update doesn’t affect the convergence (and perplexity) while improve the training efficiency (by fully utilize the bubbles in pipeline parallelism) at a large margin. However, both the Megatron-LM and the DeepSpeed don’t use pipedream-2bw scheduling. Could anyone share me some insights or ideas about why such an efficient scheduling scheme doesn’t get popular in the LLM pretraining community? Does it suffer convergence/accuracy issue in practice? Or are there any other concerns that blocking it become the default / most popular pipeline parallelism scheduling? (I posted the same question in hacknews as well: Why async gradient update doesn't get popular in LLM community? | Hacker News) I have tried to implement the pipedream-2bw scheduling scheme on Megatron-LM and do can reproduce the performance gain as well as loss convergence with GPT-2 345M using 8xV100 GPUs: https://github.com/sighingnow/Megatron-LM/blob/ht/dev-pipe/megatron/core/pipeline_parallel/schedules.py#L1421 submitted by /u/sighingnow [link] [comments]  ( 9 min )
    [D] IDE?
    What’s the best IDE to work with or is it on user needs that determines best fit or is their one top dog and dominator that can robustly if not better preform other IDE’s ? submitted by /u/External_Age_5855 [link] [comments]  ( 9 min )
  • Open

    Neural Networks From Scratch in Rust
    submitted by /u/zezeartix [link] [comments]  ( 8 min )
    Activation function for generating Shapley values
    Hi, I want to train a neural network to calculate Shapley values based on a given characteristic function. Depending on a given characteristic function, calculated through a dedicated algorithm, Shapley values can be any number, positive or negative, without a set range. Because of this, I am unsure, for the specific application of calculating Shapley values, what activation function to use in a neural network that would calculate them. The relu function, as well as leaky relu function, either cannot give values that are negative or have trouble giving large negative values, and sigmoid or tanh can only give values in a certain range. I am aware that there are other commonly used activation functions, but all the ones I could find had one of these issues, which would make training a network to calculate Shapley values difficult. Any advice? submitted by /u/PowNotBigSurprise [link] [comments]  ( 9 min )
    A hugging face implementation for style gan to produce user avatar
    I was thinking to create an app based on style gan which will include facebook , instagram theme and style transfer it with profile pic so shall i create this app or not .I want to know if it will be good idea. submitted by /u/No_Claim_8651 [link] [comments]  ( 9 min )
  • Open

    Improve performance of Falcon models with Amazon SageMaker
    What is the optimal framework and configuration for hosting large language models (LLMs) for text-generating generative AI applications? Despite the abundance of options for serving LLMs, this is a hard question to answer due to the size of the models, varying model architectures, performance requirements of applications, and more. The Amazon SageMaker Large Model Inference […]  ( 13 min )
    Index your web crawled content using the new Web Crawler for Amazon Kendra
    In this post, we show how to index information stored in websites and use the intelligent search in Amazon Kendra to search for answers from content stored in internal and external websites. In addition, the ML-powered intelligent search can accurately get answers for your questions from unstructured documents with natural language narrative content, for which keyword search is not very effective.  ( 7 min )
  • Open

    Python code for means
    The last couple article have looked at various kinds of mean. The Python code for four of these means is trivial: gm = lambda a, b: (a*b)**0.5 am = lambda a, b: (a + b)/2 hm = lambda a, b: 2*a*b/(a+b) chm = lambda a, b: (a**2 + b**2)/(a + b) But the arithmetic-geometric mean […] Python code for means first appeared on John D. Cook.  ( 5 min )
    More ways of splitting the octave
    in an earlier post I said that the arithmetic mean of two frequencies an octave apart is an interval of a perfect fifth, and the geometric mean gives a tritone. This post will look at a few other means. Intervals The harmonic mean (HM) gives a perfect fourth. The arithmetic-geometric mean (AGM) gives a pitch […] More ways of splitting the octave first appeared on John D. Cook.  ( 6 min )
    Maclaurin’s inequality
    This afternoon I wrote a brief post about Terence Tao’s new paper A Maclaurin type inequality. That paper builds on two classical inequalities: Newton’s inequality and Maclaurin’s inequality. The previous post expanded a bit on Newton’s inequality. This post will do the same for Maclaurin’s inequality. As before, let x be a list of real […] Maclaurin’s inequality first appeared on John D. Cook.  ( 5 min )
    Newton’s inequality and log concave sequences
    The previous post mentioned Newton’s inequality. This post will explore this inequality. Let x be a list of real numbers and define Sn(x) to be the average over all products of n elements from x. Newton’s inequality says that Sn−1 Sn+1 ≤ S²n In more terminology more recent than Newton, we say that the sequence […] Newton’s inequality and log concave sequences first appeared on John D. Cook.  ( 5 min )
  • Open

    Research Focus: Week of October 9, 2023
    Research Focus: Principal researcher Lester Mackey recognized for pioneering statistical and ML techniques; Pareto frontiers in neural feature learning; structural inequality in the influencer industry; new research on cardinality estimation. The post Research Focus: Week of October 9, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    Take the Wheel: NVIDIA NeMo SteerLM Lets Companies Customize a Model’s Responses During Inference
    Developers have a new AI-powered steering wheel to help them hug the road while they drive powerful large language models (LLMs) to their desired locations. NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it Read article >  ( 6 min )
  • Open

    Gain and bias params in Mujoco
    Hi! I'm new to Mujoco and robot dynamics. When I read the Mujoco document, I'm confused about the gainprm and biasprm parameters. I want to understand the meaning of these parameters and tune the actuation speed of my actuator. An easy-to-understand explanation or supporting material would be appreciated. Thanks in advance. submitted by /u/UpperSearch4172 [link] [comments]
    LoopQuest, A Github-like platform to host simulation environments for AI training
    Hello everyone! Here is my pet project, https://www.loopquest.ai/. I am trying to build a platform like Github to let people upload their simulation environments so people can train their AI agents by interacting with the environments created by others. Here is a 2-min demo, https://youtu.be/d53NFjkU7JA. It is not launched yet but would love to get some early feedbacks. Here is the corresponding Github repo https://github.com/LoopMind-AI/loopquest. For now, the package can log env-agent interaction data by adding one extra line of code. You can think of it similar to https://github.com/google-deepmind/envlogger but with much better backend and frontend support. Any feedbacks are appreciated :) submitted by /u/jxx123 [link] [comments]

  • Open

    [D] On-Chain Reputation Model
    I am relatively new to machine learning, and I am thinking about building an on-chain reputation ML model. Here is how far I have gone in my ideation phase, can someone help with some suggestion on how I can approach this issue. Input data could include on-chain activity like number of transactions, value transferred, smart contracts interacted with, tokens held, NFTs owned, etc. Additionally, data from off-chain sources could be incorporated like identity verification, credentials, ratings, reviews, social media profiles, etc. Supervised learning algorithms like regression or classification models could be used to predict a reputation score. The target variable would be some verified reputation rating. Models like linear regression, random forests, or neural networks could work. Choice depends on size of data and complexity needed. Model would need to be transparent and parameters verifiable on-chain for validity. So linear models or simple neural networks may be most practical initially. The model could be trained off-chain initially but ultimately parameters and logic stored on-chain. Predictions could also be verified on-chain. Careful feature selection is important so the model relies on signals that are resistant to manipulation and capture true reputation. The model would need continuous updates as new data comes in reflecting latest reputation. This would require clear on-chain governance. Issues like privacy, collusion resistance, and censorship resistance would need to be addressed through crypto mechanisms like zero-knowledge proofs. P.S. This is a personal project I want to attempt to level up my ML skills. submitted by /u/AdParticular2891 [link] [comments]  ( 9 min )
    [D] Pivoting jobs to ML
    Hi everyone, I recently started a job as a Junior Data Engineer. I have learned a lot so far working with DBT, Snowflake, Looker, Jira workflow, and Git using SQL and Python. I plan to stay at this company for 2 years. My boss has assured me that if I work hard I will progress from a Junior to full Data Engineer. After 2-3 years as a DE, I want to level up and move towards Data Science/ ML roles. My questions are: What other skills should I learn to enable me to pivot into something ML related? Should I find a job as a Data Scientist first, then try for ML jobs? Just looking for some advice/suggestions. Thanks! submitted by /u/SydeFxs [link] [comments]  ( 9 min )
    Problem solving in programming [D]
    Hello Redditors, I am a student who is currently studying Bachelor of Science in AI. I have a question regarding improving my coding skills. I am aiming for a research internship and I don't know where to start. I previously took a summer school that taught me a lot about state-of-the-art models such as GANs, Transformers, VAEs, GNNs, etc. I would like to improve my coding skills, specifically problem-solving and writing clean code. I have experience with deep learning in general and data analysis. I am looking for a research internship next summer. Where should I start? I plan to review some of the deep learning material in the Deep Learning Specialization before taking the GAN specialization. However, when it comes to coding, I want to think like a software engineer or a great programmer. What do you guys suggest for improving my coding or problem-solving skills? I'm feeling confused with multiple resources and I don't know where to begin. I’d really appreciate your help. submitted by /u/misplacedlion [link] [comments]
    Random forest trained on insider trades [D]
    Would be very appreciative if someone looked at these results and pointed out potential / actual flaws. Dataset basics: insider trade details, insider trades over the last month, insider trades over the last week, (…) stock return over the last month (…), 46 columns total. Labels… 0: -5% + 5% Dates predicted: reported date. Usually 2-3 days behind transaction. Also, not positive if results are significant in the first place so that would be a great call out as well. Colab notebook: https://colab.research.google.com/drive/1fO1hVsVMWN3TORNj4OQn5UbWQOeug4fi?usp=sharing submitted by /u/This_Cardiologist242 [link] [comments]  ( 9 min )
    [R] ALMT: Using text to narrow focus in multimodal sentiment analysis improves performance
    Multimodal sentiment analysis combines text, audio and video to understand human emotions. But extra inputs can add irrelevant or conflicting signals. So filtering matters. Researchers made a "Adaptive Language-guided Multimodal Transformer" (ALMT) that uses text to guide filtering of visual and audio data. This creates a "hyper-modality" with less noise that complements the text. They tested it on datasets like MOSI (YouTube reviews), MOSEI (YouTube clips) and CH-SIMS (Chinese videos). ALMT achieved improved accuracy: MOSI: YouTube movie reviews with 2,199 samples. ALMT achieves state-of-the-art performance on various metrics including 6% higher 7-class accuracy. MOSEI: 22,856 YouTube clips covering sentiment-rich scenarios. ALMT improves multi-class accuracy by 3-5% over previous methods. CH-SIMS: Chinese dataset with over 2,000 video samples. ALMT surpasses prior work by 1.4% in binary accuracy. Analyses showed big drops in performance without the guided filtering, so this validates that it's the main innovation. Downsides are it needs lots of training data and has minor gains on sparse regression metrics. But overall the technique of filtering multimodal data under text guidance gives improvements. The concepts feel intuitive - use dominant signals to filter others and retain useful complements. My guess is it would transfer well to other multimodal tasks. TLDR: New way to filter multimodal data for sentiment analysis using text guidance improves performance. Shows the value in removing distracting signals. Sometimes less is more. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Has anyone evaluated Tiktok's algorithm for their recsys use case? [D]
    As a disclaimer, I am not familiar with many Recsys benchmarks. So I know Tiktok published a white paper on their purported algorithm, Monolith, but it is unclear if that is what they use in their products or not. Given, recommender systems seem to be core to Bytedance's business, I imagine they wouldn't provide many details. Has anyone evaluated Monolith on their own products and seen an improvement? I think the app is impressive and am wondering how it has transferred to other use cases. ​ submitted by /u/HybridRxN [link] [comments]
    [P] Optimistix, nonlinear optimisation in JAX+Equinox!
    Hi everyone! I wanted to advertise my new JAX optimisation library Optimistix! Optimistix has high-level APIs for minimisation, least-squares, root-finding, and fixed-point iteration and was written to take care of these kinds of subroutines in Diffrax. Here is the GitHub: https://github.com/patrick-kidger/optimistix The elevator pitch is Optimistix is really fast, especially to compile. It plays nicely with Optax for first-order gradient-based methods, and takes a lot of design inspiration from Equinox, representing the state of all the solvers as standard JAX PyTrees. For those familiar with classical nonlinear unconstrained optimisation, Optimistix does some pretty nifty new things. It introduces new abstractions for modular optimisers, allowing users to mix-and-match different optimisation techniques easily. For example, creating a BFGS optimiser with Levenberg-Marquardt style Tikhnov regularisation takes less than 10 lines of code in Optimistix. I'm using Optimistix as a tool for my own research, and continue to work on it as part of my PhD (supervised by Patrick Kidger.) I would love for some more people to try it, so let me know what you think! submitted by /u/packquickly [link] [comments]  ( 9 min )
    [D] Document layout - recreating the structure
    Hello, Document layout analysis has been a great tool so far to extract the components of a document (title, paragraph, tables ...). I'm working on long text PDF which are mostly scanned documents. One of the process involved after document layout analysis, is to recreate the document structure: creating sections, sub section, sub sub sections and so on. As of today, this task is done by parsing the title and finding out any ordering information (numeric, alphabetical or roman notation): 1. Title A 1.1 Title B 2. Title C 2.a) Title D This technique works only if a document follows this constraint (numeration). I want to go one step further, where the algorithm could create the document structure with any title ordering information. I believe that relying only on parsing cannot do the trick. What could be the options? Given that the only features are: title's text and title's position (x,y) in the document. I was wondering if a model like a seq2seq could fit this problem, or should I stick with an engineering rule based approach. Thanks ​ submitted by /u/mathrb [link] [comments]  ( 9 min )
    [R] Is there an enstablished method to test if something has been memorized / seen by black-box LLMs?
    I am using ChatGPT and other LLMs for which the training data is unknown. I am using them to test a set of MC question from a medical test published after the models knowledge cutoff. However, I cannot be 100% sure the questions were not on the internet beforehand. Is there any established method or testsuit to try to understands weather a given instance has been seen at training time? All I can think is looking at memorization or at perplexity, but I was looking for a more out of the box methodology that people use. It seems to me that the problem is quite general. Thanks! Edit: I know LLMs do not just memorize things and learn pattern. However, there is research on trying to understand if a datapoints has been used in training or not. Eg there is research that tries to exploit the fact that seen text has normally lower perplexity than unseen text or other similar infornation. I was wonderibg what the state in this topic is and if something is normally used as a score to have some clues. I do not expect to be able to retrieve the exact same questions lol submitted by /u/ombelicoInfinito [link] [comments]  ( 9 min )
    [D] Extracting Multi-modal embeddings (Image + text) to be used for visual similarity purposes
    I am looking for methods/frameworks to extract multi-modal embeddings from images and text for similarity search purposes. The problem setup is slightly different from how CLIP style methods are generally used ( where similarity between text and image embeddings obtained through the model are computed to assess how similar a caption is to an image). My intended application is similarity search, where I want to find entries of images and captions pair similar to a piece of the query image and caption encoded together. Some approaches I tried: I tried concatenating the textual and visual embeddings obtained from CLIP and ResNET with textual embeddings and using it with cosine similarity, but it had limited utility. My guess is that concatenating two modalities merely without any training would yield very little utility. The next direction could be to train a model to fuse the embeddings obtained, but my dataset size is really small (10 thousand total), so not sure if training a model would be helpful. Are there any approaches that can allow me to combine the multi-modal embeddings for similarity purposes, similar to how pre-trained ResNET or Inception can be used off-the-shelf for retrieving visually similar images? Any pointers/advice would be greatly appreciated. submitted by /u/No-Commission3556 [link] [comments]  ( 9 min )
    [P][D] Building Datasets
    In my ML/AI journey up until now, most training and hands-on labs either use a pre-built dataset or have you build a pretty simple and flat dataset. I am now looking to stretch my exploration into some real-world use cases and find the data I want is way more complex. Researching online feels like the meme on learning to draw an owl. So I'm looking for some guidance on how to handle my data. The data is an array from a rest API that includes all alarms from an application as nested objects. So the data looks like this for a single event: data = { "event_data": [ { "root_cause": "Root cause added after API calls" "alarms": [ { "alarm_id": "alarm_id", "alarm_name": "alarm_name", "alarm_type": "alarm_type", "alarm_description": "alarm_description", "alarm_details": { pro1: val1, prop2: val2, etc... }, "actual_alarm_value": { any_random_key: "any_random_value", etc... }, } ], } ] } I need to build a dataset that includes many of these events with the ultimate goal of predicting future events. I plan to test this against various ML models and LLMs. Each event would be a single row, and I would flatten out each alarm so each nested property has its own column. Where I need clarification is how to handle the flatting of alarms. If I fully flatten them, it appears like I lose the context of the alarm's parent event. But if I only flatten them to the alarm level, I lose each property having its own row Also actual_alarm_value is very random, so my thinking is to use string encoding here. I know this is a lot of detail, and I appreciate any and all advice and help in learning how to do this. submitted by /u/that1guy15 [link] [comments]  ( 9 min )
    [D] Is there a REST API for text embeddings?
    I'm aware there are commercial offerings like OpenAI and cohere with the embedding API. But what about for open source models like the ones from SentenceTransformers? I'm aware you can use the HuggingFace inference API, but it's probably not best for commercial use, in which case the Inference endpoints would be better, but it's quite pricey for a startup with no customers. I also know I could use some kind of serverless GPU / inference platform to create my own API. But is there just a straight-up REST API for getting text embeddings from a model via SentenceTransformers or other HuggingFace models? submitted by /u/TheSaasDev [link] [comments]  ( 9 min )
    [D] Langauge Confusion.
    I am a Second Year Student I'm planning to start learning ML which obviously requires python. But at the same time I wanna start practicing DSA / competitive programming as well. I'm sorta in this dilemma of what to do. Since python is a must for ML I'm 100% doing it, but for DSA I am confused whether I should learn DSA in Python or C++. People say C++ is the best and ideally I should do that. But python suits my need more. Obviously I don't mind doing both languages together but it seems a bit redundant. P.S: I'm learning DS basics in college via C language so learning the basic concepts isn't an issue. What do you suggest? submitted by /u/No-Discipline-2354 [link] [comments]
    [Project] I created a tool that navigates the Internet and scrapes data using GPT-4
    Hi! I created a universal data API that uses headless browsers and GPT to extract any data from the web in JSON format. I started this project because I needed some API to do data enrichment to get company data (headcount, investment rounds, etc.). Once I did the first version, I quickly realized that there can be many use cases for such a tool: data enrichment, web scraping, data validation, etc. You can get the early access to the API here: https://singleapi.co/ Thanks! submitted by /u/semanser [link] [comments]  ( 9 min )
    Applied AI/ML/ Data Science MS in Germany [D]
    Hey folks, I graduated from a tier 2 college in India with an ECE degree and then started working as an ML engineer in a mid-size startup 2 years ago. (1 year of internship + 1 year of Full time employment at the same company). Now, I am looking to get a Master's Degree in AI/ML/DS in Germany starting Winter 2024. I am a person with interests in Industry skills(Applied AI/ML) rather than the research/academia part as I don't wish to pursue a PhD nor do I want to be stuck in a Math-deep subject that may not be relevant for me in the future. On account of this, I wanted to know which college/degree offers the best balance in-between theory and applied AI/ML/DS. Also, people have been telling me that exams are super tough and it is hard to successfully complete an AI/ML/Data Science MS degree in Germany, Is it true? It has been super discouraging for me to hear this and is affecting me mentally to go through the application process. PS. CS/Electrical Degrees with good electives for AI/ML/DS are also good enough for me (Just hoping the coursework/grading is not too harsh) Also, it would be great if someone could clarify if an Electronics and Communications student can apply for a CS degree in Germany. Sorry for asking too many questions, TIA. :) submitted by /u/TheDivineKnight01 [link] [comments]  ( 9 min )
    [D] Prompting as searching through a space of vector programs
    Enlightening article from Francois Chollet about #LLMs and embeddings "Prompt engineering is the process of searching through program space to find the program that empirically seems to perform best on your target task." ​ https://fchollet.substack.com/p/how-i-think-about-llm-prompt-engineering submitted by /u/alexisperrier [link] [comments]  ( 9 min )
    [D] Best approach to verify 4 million sentence-named entity pairs ?
    I have a dataset of about 4 million pairs of sentence-named entity. Looks like this: Sentence: MarketWatch has reached out to Charles Schwab and GQG for comment. Corresponding NER Tags: [{'end': 6, 'entity': 'B-ORG', 'index': 1, 'score': '0.98322886', 'start': 0, 'word': 'Market'} {'end': 7, 'entity': 'I-ORG', 'index': 2, 'score': '0.969261', 'start': 6, 'word': '##W'} {'end': 11, 'entity': 'I-ORG', 'index': 3, 'score': '0.97644824', 'start': 7, 'word': '##atch'} {'end': 38, 'entity': 'B-PER', 'index': 8, 'score': '0.9927636', 'start': 31, 'word': 'Charles'} {'end': 41, 'entity': 'I-PER', 'index': 9, 'score': '0.99394774', 'start': 39, 'word': 'Sc'} {'end': 44, 'entity': 'I-PER', 'index': 10, 'score': '0.41437265', 'start': 41, 'word': '##hwa'} {'end': 45, 'entity': 'I-PER', 'index': 11, 'score': '0.46933985', 'start': 44, 'word': '##b'} {'end': 51, 'entity': 'B-ORG', 'index': 13, 'score': '0.9984176', 'start': 50, 'word': 'G'} {'end': 52, 'entity': 'I-ORG', 'index': 14, 'score': '0.99367344', 'start': 51, 'word': '##Q'} {'end': 53, 'entity': 'I-ORG', 'index': 15, 'score': '0.99617106', 'start': 52, 'word': '##G'}] What would be a good approach to verify the correctness of each item? submitted by /u/shardblaster [link] [comments]  ( 9 min )
  • Open

    MusicGPT: Create unique music from text prompts
    submitted by /u/SaucySporky [link] [comments]
    Website to do the Following: I Give it a Design and Create an Image With it
    Hello all, I am not sure this is out yet. I would like to find a website where i can upload an image I own, and have it generate another image around it. Let's say I have some shirts that say 'HOLA'. I would want, for example, to generate an image of Socrates wearing said shirt. Is this possible? If so, which site would allow me to do this? ​ Cheers and merci! submitted by /u/JYanezez [link] [comments]
    So far, AI hasn't been profitable for Big Tech
    Big Tech companies like Microsoft and Google are grappling with the challenge of turning AI products like ChatGPT into a profitable enterprise. The cost of running advanced AI models is proving to be a significant hurdle, with some services driving significant operational losses. Corporate customers are unhappy with the high running costs of AI models. The nature of AI computations, which require new calculations for each query, makes flat-fee models risky. Some companies are trying to dial back costs, while others continue to invest more deeply in AI tech. Microsoft's GitHub Copilot, which assists app developers by generating code, has been operating at a loss despite attracting more than 1.5 million users. One of the reasons AI services are costly is that some companies have been reaching for the most powerful AI models available. Microsoft has been exploring less costly alternatives for its Bing Chat search engine assistant. Advances in AI acceleration hardware may eventually reduce the costs of operating complex models. Experts anticipate a more stringent financial approach in the near future, transitioning from experimental budgets to focusing on profitability. Source : https://arstechnica.com/information-technology/2023/10/so-far-ai-hasnt-been-profitable-for-big-tech/ submitted by /u/NuseAI [link] [comments]
    Dubbing By ElevenLabs. Share your fav videos in your native language!! Go try
    submitted by /u/ShooBum-T [link] [comments]
    The environmental impact of the AI revolution is starting to come into focus
    The environmental impact of the AI revolution is starting to become clear, with generative AI like ChatGPT increasing Google Search's energy use more than tenfold. The worry is that the computing power required for AI could lead to increased energy consumption and carbon footprint of data centers. AI already accounted for 10 to 15 percent of Google's electricity consumption in 2021. Google claims that the energy needed to power AI technology is increasing at a much slower rate than predicted, and they are implementing practices to reduce the carbon footprint of AI workloads. The worst-case scenario of Google Search using as much electricity as Ireland is unlikely, but the potential energy consumption of AI servers could grow significantly if AI's popularity continues to rise. Source : https://www.theverge.com/2023/10/10/23911059/ai-climate-impact-google-openai-chatgpt-energy submitted by /u/NuseAI [link] [comments]
    ALMT: Using text to narrow focus in multimodal sentiment analysis improves performance
    Multimodal sentiment analysis combines text, audio and video to understand human emotions. But extra inputs can add irrelevant or conflicting signals. So filtering matters. Researchers made a "Adaptive Language-guided Multimodal Transformer" (ALMT) that uses text to guide filtering of visual and audio data. This creates a "hyper-modality" with less noise that complements the text. They tested it on datasets like MOSI (YouTube reviews), MOSEI (YouTube clips) and CH-SIMS (Chinese videos). ALMT achieved improved accuracy: MOSI: YouTube movie reviews with 2,199 samples. ALMT achieves state-of-the-art performance on various metrics including 6% higher 7-class accuracy. MOSEI: 22,856 YouTube clips covering sentiment-rich scenarios. ALMT improves multi-class accuracy by 3-5% over previous methods. CH-SIMS: Chinese dataset with over 2,000 video samples. ALMT surpasses prior work by 1.4% in binary accuracy. Analyses showed big drops in performance without the guided filtering, so this validates that it's the main innovation. Downsides are it needs lots of training data and has minor gains on sparse regression metrics. But overall the technique of filtering multimodal data under text guidance gives improvements. The concepts feel intuitive - use dominant signals to filter others and retain useful complements. My guess is it would transfer well to other multimodal tasks. TLDR: New way to filter multimodal data for sentiment analysis using text guidance improves performance. Shows the value in removing distracting signals. Sometimes less is more. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI Political Analysis Test: Describe the commonalities between the Palestinians fighting for a homeland and the American Revolution.
    The Palestinians fighting for a homeland and the American Revolution have a number of commonalities, including: Fighting against a colonial power:** Both the Palestinians and the American colonists were fighting against a colonial power that had imposed its rule on them. In the case of the Palestinians, the colonial power is Israel, which was established in 1948 on land that had been inhabited by Palestinians for centuries. In the case of the American colonists, the colonial power was Great Britain. Fighting for self-determination:** Both the Palestinians and the American colonists were fighting for their right to self-determination, or the right to govern themselves. The Palestinians want to establish their own independent state, while the American colonists wanted to break away from Gr…
    I made "Pi: your personal IA" to have an opinion.
    submitted by /u/LonePrron [link] [comments]
    IBM CEO: Washington should hold tech firms accountable for AI
    submitted by /u/smo279 [link] [comments]
    Automated my Youtube Channel Using GPT 4
    Hi Everyone, I have automated the content creation for my youtube channel. It got total views of 8.5K and some videos getting 2.5K views. https://www.youtube.com/channel/UCG0-UemyRMUs1JJlQMK9lzA All the things are automated like:- Script Generation Voiceover Image Generation Subtitles I do minor tweaks here and there but majorly its automated. I posted is somwhere and people were commenting what's the use of the mindless videos? This is the begining, I want to automate the editing of videos. User can upload raw videos and I should be able to give multiple final edit videos. I have built a small tool blinkcuts.com, If anyone intersted. I can give access. Please DM for access. submitted by /u/raxrb [link] [comments]
    Saudi-China collaboration raises concerns about access to AI chips
    Saudi-China collaboration raises concerns about access to AI chips. The trial period includes complete digital access to FT.com with everything in both the Standard Digital and Premium Digital packages. At the end of the trial, users will be auto-enrolled in the premium digital monthly subscription plan for $69 per month. Payment can be made through credit card, debit card, or PayPal. Source : https://www.ft.com/content/2a636cee-b0d2-45c2-a815-11ca32371763 submitted by /u/NuseAI [link] [comments]
    Looking for the free AI tool which removed the noise from the video:
    Hey, I am looking for the free AI tool which removed the noise from the video. If there is any, do suggest. Thank You in Advance. submitted by /u/Haziq12345 [link] [comments]
    Looking for the free AI tool which removed the noise from the video:
    Hey, I am looking for the free AI tool which removed the noise from the video. If there is any, do suggest. Thank You in Advance. submitted by /u/Haziq12345 [link] [comments]
    How do AI-driven demand forecasting models handle market volatility and unexpected events, such as economic crises or pandemics?
    If you have any resources then do share. submitted by /u/Cygnet-Digital [link] [comments]
    AI Power Distribution Scenarios.
    submitted by /u/Philipp [link] [comments]
    As drone traffic increases, researchers turn to AI to help avoid collisions
    submitted by /u/Tao_Dragon [link] [comments]
    Is this a viable approach for a small plant manufacturing engineer?
    I'm a small plant engineer who covers manufacturing, process, quality, and new product design. I wear many hats in my job and it's a lot of responsibility. One way I've attempted to tame the complexity is by using good reference books. I've accumulated quite the collection through the years. Some print others digital. I've also got a lot of digital notes. And that's a lot of data. I've been playing around with sharly.ai (thanks to this sub for recommending) and uploading documents to it and querying them. Its been able to find the information every time it's been available. And more importantly it's provided sources and page numbers. This is important, since I've never been able to find a conversational AI that gives me consistently good answers (including the latest chatgpt), and I always need to read deeper. I also need to backup my work. So in this way it's basically a super index. I also bought a tablet for note-taking and basic sketches. The idea is to use the tablet to take notes, hold my library for reading, and interact with sharly.ai. Is this approach good enough, or is there something else I can do? submitted by /u/Aggressive_Ad_507 [link] [comments]
  • Open

    Issue with MuJoCo Simulation: Robot Penetrates the Ground
    Hello everyone, I'm working on simulating a modified humanoid robot, "DARwIn OP 3", using MuJoCo through dm_control in Python. My goal is to train the model to ascend stairs rapidly but these are the first steps. However, I've encountered a problem where the robot appears to sink into the ground and is then ejected with significant force under specific conditions. ​ https://reddit.com/link/174vpzw/video/u636vf49tftb1/player Environment: MuJoCo via dm_control. Issue Description: When the robot falls and its feet move, it behaves as though one of its motors sinks into the floor. Attempts: I've tweaked contact parameters and ground properties with no luck. Interestingly, this doesn't occur in the standalone MuJoCo simulator. Visual Aid: I've attached a video to illustrate the problem…
    Algorithms for average reward reinforcement learning in continuous/general state-action space
    I see that discounted reward reinforcement learning has been extensively studied in the literature. However, the average reward metric receives less attention, and it looks like algorithms for this metric (R-learning, H-learning, SMART, etc.) are much less than the discount metric. Could you suggest any algorithms for average reward reinforcement learning for continuous/general state-action space? submitted by /u/S1gnature [link] [comments]
    "How Disney Packed Big Emotion Into a Little Robot" (sim2real)
    submitted by /u/gwern [link] [comments]
    I took OpenAI's paper about defeating Dota2 world champions, and explained it paragraph-by-paragraph.
    submitted by /u/mngrwl [link] [comments]
    What's your view on the recent RT-X efforts/scaling via IL?
    With recent RT-X efforts from Deepmind, it seems the community has been shifting towards the development of a more generalized foundational model, combining with visions and languages, and scaling via imitation learning. I know RL algorithms are expensive to train and hard to scale due to the way the samples are generated, but I am still fascinated by the intelligence behind their philosophies. What do you think the future would look like? Like NLP or CV, having a big foundational model pre-trained via IL, and fine-tune on different tasks via RL? How can we tell if a task is simple enough that we don't need to leverage the power of a foundational model? submitted by /u/Old_Reading_669 [link] [comments]
  • Open

    U statistics and a new paper by Terence Tao
    Terence Tao has a new paper out that relates to a couple things I’ve written about recently. Elementary symmetric polynomials came up when developing the general equations for tangent sum and hyperbolic tangent sum. The latter post goes into more detail. Before that, means of symmetric functions, not necessarily elementary polynomials or even polynomials, came up […] U statistics and a new paper by Terence Tao first appeared on John D. Cook.  ( 5 min )
    Detecting fraud with the GRIM test
    The latest episode of Erik Seligman’s podcast is entitled The Grim State of Modern Pizza. Although you might not realize it from the title, the post is about fraud detection. GRIM stands for Granularity-Related Inconsistency of Means. In a nutshell, the test looks for means (averages) that are not possible on number theoretic grounds. If […] Detecting fraud with the GRIM test first appeared on John D. Cook.  ( 5 min )
    Tritone
    A few weeks ago I wrote about how the dissonance of a musical interval is related to the complexity of the frequency ratio as a fraction, where complexity is measured by the sum of the numerator and denominator. Consonant intervals have simple frequency ratios and dissonant intervals have complex frequency ratios. By this measure, the […] Tritone first appeared on John D. Cook.  ( 6 min )
    When a function cannot be extended
    The relation between a function and its power series is subtle. In a calculus class you’ll see equations of the form “series = function” which may need some footnotes. Maybe the series only represents the function over part of its domain: the function extends further than the power series representation. Starting with the power series, […] When a function cannot be extended first appeared on John D. Cook.  ( 5 min )
  • Open

    DSC Weekly 10 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 10 October 2023 appeared first on Data Science Central.  ( 20 min )
    How to ensure data security when sharing business-critical information
    Introduction  In an era where data is often termed the ‘new oil,’ its security holds unparalleled importance for businesses across industries. With the proliferation of digital platforms, sharing business-critical information has become routine yet perilous. From financial records to customer data, organizations frequently exchange sensitive information that, if compromised, could have dire consequences. Given the… Read More »How to ensure data security when sharing business-critical information The post How to ensure data security when sharing business-critical information appeared first on Data Science Central.  ( 21 min )
    How does combining blockchain and AI create new business opportunities?
    Gartner predicts blockchain’s economic impact to reach $176 billion by 2025 and $3.1 trillion by 2030. The AI software market is expected to reach $134.8 billion by 2025. Blockchain and AI benefit businesses. AI models process data, extract insights, and make decisions. Blockchain ensures data integrity and trust among participants. Read on to discover the… Read More »How does combining blockchain and AI create new business opportunities? The post How does combining blockchain and AI create new business opportunities? appeared first on Data Science Central.  ( 22 min )
    Understanding the difference: Data analyst, data scientist, and data engineer
    In the contemporary digital landscape, data has emerged as a critical asset for organizations aiming to make informed decisions and foster innovation. Data analytics can unlock a treasure trove of insights, driving competitive advantage and operational excellence by leveraging the vast amounts of data generated every second. As a consequence, the demand for skilled professionals… Read More »Understanding the difference: Data analyst, data scientist, and data engineer The post Understanding the difference: Data analyst, data scientist, and data engineer appeared first on Data Science Central.  ( 24 min )
    11 Questions Every CEO Should Ask about AI / Generative AI
    I’ve been in this industry for over 40 years (yes, I just started in the data and analytics industry when I was 11), and I have NEVER seen anything like Artificial Intelligence (AI) and Generative AI (GenAI) capture the attention of CEOs (and the dystopic fear of everyone else). Is AI a game-changer?  Definitely!  Will… Read More »11 Questions Every CEO Should Ask about AI / Generative AI The post 11 Questions Every CEO Should Ask about AI / Generative AI appeared first on Data Science Central.  ( 23 min )
  • Open

    New – No-code generative AI capabilities now available in Amazon SageMaker Canvas
    Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without the need to write any code. Ready-to-use models enable you to derive immediate insights from text, image, and document […]  ( 7 min )
    Whisper models for automatic speech recognition now available in Amazon SageMaker JumpStart
    Today, we’re excited to announce that the OpenAI Whisper foundation model is available for customers using Amazon SageMaker JumpStart. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need […]  ( 11 min )
    Reinventing a cloud-native federated learning architecture on AWS
    In this blog, you will learn to build a cloud-native FL architecture on AWS. By using infrastructure as code (IaC) tools on AWS, you can deploy FL architectures with ease. Also, a cloud-native architecture takes full advantage of a variety of AWS services with proven security and operational excellence, thereby simplifying the development of FL.  ( 12 min )
  • Open

    MAXimum AI Performance: Latest Adobe Updates Accelerated by NVIDIA GPUs Improve Workflows for Millions of Creatives
    Generative AI is helping creatives across many industries bring ideas to life at unprecedented speed. This technology will be on display at Adobe MAX, running through Thursday, Oct. 12, in person and virtually.  ( 9 min )
  • Open

    Riddle me this: Issues when predicting a high frequency sine wave
    Hi folks, I have observed a strange behavior when implementing a VERY BASIC idea 🙂 I want to use a fully-connected Neural Network to approximate a sine wave. For that I am sampling 200.000 uniformly distributed points from a wide interval, e.g. [-60,60] and compute the corresponding sin(x) values resulting in the following training data. ​ Training data I glimpse into my setup: Model: nn.Linear(1, 16) nn.Sigmoid() Linear(16, 16) nn.Sigmoid() nn.Linear(16, 8) nn.Sigmoid() nn.Linear(8, 4) nn.Sigmoid() nn.Linear(4, 1) (I also pumped up the network to up to 100 hidden neurons on one layer) Number of samples: 200.000 (80% train / 20% test) Optimizer: Adam Loss: RMSE Epochs between 100 - 500 Learning Rate: 0.02 Batch Size: 500 - 1000 ​ Check out the screenshots below to see the results 😨 ​ The predictions are pretty good but the edge areas slow down to a very small value, without any change. This only holds for high-frequency sine waves. If we only consider the train range of [-2*np.pi , 2*np.pi] it works pretty good with small loss. ​ So my questions are: 1) Why do we see that behaviour? 2) How can we solve it ​ Cheers ​ Prediction 1 ​ Prediction 2 submitted by /u/CarKla [link] [comments]

  • Open

    [R] Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models - University of Illinois 2023 - Achieves 94.4\% for programming on HumanEval with GPT-4 and 86.9\% with GPT-3.5 20\% better than with reflexion!
    Paper: https://arxiv.org/abs/2310.04406 Abstract: While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4\% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method. https://preview.redd.it/ail2c1kbh9tb1.jpg?width=857&format=pjpg&auto=webp&s=a89d1f4ce3c536eecda3f7ab6027f304286f6c81 https://preview.redd.it/j8xzx1kbh9tb1.jpg?width=1655&format=pjpg&auto=webp&s=c791756af926c7d472313b212de765e74c2b75da https://preview.redd.it/t47ne1kbh9tb1.jpg?width=1362&format=pjpg&auto=webp&s=560e5dd82ad06fdb729ab8ea1434c98e5c1a2ed3 https://preview.redd.it/r58es3kbh9tb1.jpg?width=1341&format=pjpg&auto=webp&s=d5681992547dd6248ade5729c545eb17e824b7ea https://preview.redd.it/7viy42kbh9tb1.jpg?width=1496&format=pjpg&auto=webp&s=6454cfe65b511b34771cd510f67775be4e01c636 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] looking for in-depth tutorials and papers on NN pruning
    I only started working with neural nets a year ago and i've been having trouble understanding how pruning actually works. If there's any resources you think might help please guide me to them. thanks! submitted by /u/Sidekiiick02 [link] [comments]  ( 9 min )
    [D] Feature selection for multivariate time series model
    Say for a sample that you have 5 target variables and 30 exogenous variables. If you want to include no more than 10 exogenous variables to your time series forecast, because of overfitting issues and such, what feature selections would you apply? Could you use pca and vif for multivariate models or are there other approaches to consider? submitted by /u/AdWhole1559 [link] [comments]  ( 9 min )
    [R] ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
    Title: ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale Paper: https://arxiv.org/abs/2310.01217 Code: https://github.com/CPJKU/ScaLearn https://preview.redd.it/xvcz7obtc8tb1.jpg?width=2020&format=pjpg&auto=webp&s=26169fa234e4e714d424ce17a7f0fa2c513fc42c Abstract: Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning n tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is lev…  ( 9 min )
    [D] What is more valuable 10k CPUs or 1k GPU hours?
    Hello ML community! I recently built, incredibly simple to learn, cluster compute software. Users can (in https://www.burla.dev/ submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [R] Transformers KV Caching Explained
    https://medium.com/@joaolages/kv-caching-explained-276520203249 submitted by /u/JClub [link] [comments]  ( 8 min )
    [D] LLMs in GEC problem
    Up to now, which LLMs model, encoder-decoder model is best for the problem of grammatical error correction on uncommon language datasets (small dataset size) or languages ​​with specific characteristics (about punctuation? ,...) submitted by /u/con-nguoi-ki-cac [link] [comments]  ( 9 min )
    [D] Learning natural events / AI art generation
    Hello! 1 I'd like to know if I could train AI to recognize details found it nature / weathering / aging and feed it pictures and it would recognize them (segmenting) so it can spot them but also their positions based on surrounding shapes, and the logical placement resulting. Seems hard. 2 then feed it some examples of those aging stuff on their own (with proper tags) so it learn to reproduce them and create new ones from scratch. 3 but then feed it "clean" pics and it would age them according to patterns it could find on the base training set so it can guess where to best place them. Pretty sure 2 is trivial enough, 1 seems possible until learning the "logic", but 3? Thanks for your insight. 1 comment submitted by /u/ConfusionSame9623 [link] [comments]  ( 9 min )
    [R] Why do we need weight decay in modern deep learning? 🤔
    Title: Why Do We Need Weight Decay in Modern Deep Learning? Paper: https://arxiv.org/abs/2310.04415 Abstract: Weight decay is a broadly used technique for training state-of-the-art deep networks, including large language models. Despite its widespread usage, its role remains poorly understood. In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory. For overparameterized deep networks, we show how weight decay modifies the optimization dynamics enhancing the ever-present implicit regularization of SGD via the loss stabilization mechanism. In contrast, for underparameterized large language models trained with nearly online SGD, we describe how weight decay balances the bias-variance tradeoff in stochastic optimization leading to lower training loss. Moreover, we show that weight decay also prevents sudden loss divergences for bfloat16 mixed-precision training which is a crucial tool for LLM training. Overall, we present a unifying perspective from ResNets on vision tasks to LLMs: weight decay is never useful as an explicit regularizer but instead changes the training dynamics in a desirable way. Our code is available at this https URL. submitted by /u/m_andriushchenko [link] [comments]  ( 9 min )
    [D] Anyone tried training language models on simple (elementary school) text first and fine-tuning on progressively more advanced text?
    Seems the way people train language models today feels like sending a preschooler to a college library and telling him to start browsing books. Anyone know of papers describing language models being trained more like a child? Perhaps starting with preschool books with a tiny vocabulary and short sentence fragments like "goodnight moon...", moving up to "the lorax".... and then fine-tuning on elementary school books ... then jr high level reading ... then high school .... etc. I'm guessing this might be a path to more natural human-feeling speech. Anyone here tried this, or anyone here know of papers talking about it? submitted by /u/Appropriate_Ant_4629 [link] [comments]  ( 9 min )
    [D] Where do y'all get training data?
    Hi there, Can I ask everyone here, where do you get your custom training data from? My team is training classifier models from scratch, so need thousands of specific query/response examples to train on. It's not the kinda data you could randomly scrape or source from a library. Are there any platforms that exist where you can pay a bunch of humans to write high volumes of relatively high quality text based training data? submitted by /u/paritsky [link] [comments]  ( 9 min )
    [D] - What is SOTA for Continual Learning on pretrained LLMs, particularly those that have already undergone instruction tuning?
    If you have the dataset used to make the pretrained you could always create a new model with the old + new data, but this is often prohibitively expensive or impossible because the dataset is not available. Catastrophic forgetting seems to be the big issue, especially if you've already undergone instruction tuning since the model will lose its conversational tone. I've seen papers discussing regularization techniques to avoid that by minimizing the changes to high value attention heads but not sure if that is considered to be the most promising direction. I'm aware of LoRAs but I imagine at some point you can't just arbitrarily cram new info into such a low dimensional space. submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [R] Thought Propagation: An analogical approach to complex reasoning with LLMs
    LLMs are great at basic reasoning when prompted, but still struggle with complex multi-step problems like optimization or planning. Humans tackle new problems by drawing on intuition from similar experiences, which LLMs can't do. Researchers propose "Thought Propagation" to have LLMs reason more like humans - by thinking analogically. First, GPT is prompted to suggest related "analogous" problems to the input. Then it solves those. Finally, it aggregates the solutions to directly solve the input problem or extract useful strategies. They tested this technique on challenges like finding optimal graph paths, writing coherent stories, and planning for LLM agents. Across different models, it significantly boosted performance over regular prompting: 12% better at finding shortest paths 13% improvement in creative writing (human preference) 15% higher task completion for LLM agents It also beat chain-of-thought (there is a comparison to CoT and ToT in the paper). After 1-2 iterations, adding more layers of analogy didn't help much. Efficiently generating useful analogies is still difficult and that's a limitation. I think this is interesting because it shows the value of "meta-cognition" - having models reflect on their own reasoning. More techniques like this could incrementally improve LLMs' reasoning to be more human-like. TLDR: Teaching LLMs to reason analogically, using solutions for similar problems as hints, significantly boosts their complex reasoning ability. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] How to deal with the inconsistency of eyeball location in the output of a GAN-based face-swapping model.
    I tried a few open-source GAN-based face swapping models. Some of the models have issues of the inconsistency of eyeball location (or eye direction) between the original and face-swapped ones. Any suggestions? Thanks. submitted by /u/Curious_Dragonfly_13 [link] [comments]  ( 9 min )
    [D] I need to perform k-mean clustering on a large image dataset to downsample the majority class.
    I have a class with around 96031740 96x64 images and need to select a sample of 17929 to match the minority class of my classification problem. Having already established a baseline based on random sampling of the majority class; now I am looking to try more complex approaches. I am specifically trying to replicate the 'nearest neighbor of clustering center' approach from Lin et al., 2017. The problem is I am working on my desktop and only have 32 Gb of RAM and 2 1Tb NVMe disks at half capacity. I have tried working with only 10% of the data and still the MiniBatchKMeans function of sklearn doesnt have enough space to run: "numpy.core._exceptions._ArrayMemoryError: Unable to allocate 440. GiB for an array with shape (9603174, 6144) and data type float64". Does anyone have a suggestion on how I can move forward? Could cloud services be an option? Thanks References: Lin, W. C., Tsai, C. F., Hu, Y. H., & Jhang, J. S. (2017). Clustering-based undersampling in class-imbalanced data. Information Sciences, 409–410, 17–26. https://doi.org/10.1016/j.ins.2017.05.008 submitted by /u/RafaeldeCampos [link] [comments]  ( 9 min )
    [D] What are the best network analysis tools, like tensorboard?
    Almost everyone I know uses tensorboard to analyze their network outputs. Some people swear on Weights & Biases instead. Are there any other tools that help you with your work? submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] Training strategy considering the possibility of 'double descent' or 'grokking'
    During the training of overparameterized neural networks, when I observed decreasing training loss and increasing or non-decreasing validation loss, how should I decide if I should stop training and start a new experiment (with stronger regularization) or keep training to wait for 'grokking' or 'double descent' to happen? Are there any papers giving methods or some metrics to detect 'grokking' or 'double descent' in the early stage of training? submitted by /u/alayaMatrix [link] [comments]  ( 9 min )
    [R] Legged Robots performing Extreme Parkour using Deep Reinforcement Learning just from a Front Camera (link in comments)
    submitted by /u/pathak22 [link] [comments]  ( 8 min )
    [D] I need guidance related to using machine learning & ai to prevent uploads or remove certain type of content from a web app.
    I am working on an a web app where people will be able to upload photos and write text. I don't want to have problems with my government or other countries governments in regards with the content that is uploaded to my website. I have searched about measures that can be taken to avoid this from happening. Adding a report button and having moderators are both good starting options. I thought that as time passes, more and more content is going to be created by the users so supervising that people are following the rules needs to be automated from the beginning. Applying measures to prevent people from uploading/posting links containing nudity, child porn, beastiality, or whatever users capture with a camera that could lead to legal problems must be a priority and allowing this type of content is not ethical. I am a software developer, but I haven't delved into machine learning and ai for most of my career because I haven't to. This seems like the perfect case to learn by doing and time is not a constraint, but I need some guidance. I have read superficially about how people train models by providing lots of data, I imagine other websites that use machine learning & ai to remove this type of content don't download media that contains nudity, child pornography, besteality, etc to train their models and make their tests. There must be some pretrained models, maybe, but how would they test this works? I don't know, I am just thinking on my own how other devs are currently handling this. I am no looking for upvotes, I don't care for downvotes, I am just looking for guidance, and I would be very happy to hear the opinion of someone with experience. submitted by /u/Comitatense [link] [comments]  ( 10 min )
  • Open

    I Condemn the Attack by Hamas
    I strongly condemn the recent and horrific attack by Hamas against Israel. I have some disagreements with the government of Israel. But, I do not support such an attack. As a point of comparison, I do not always agree with the United States government, but I would not be celebrating if Mexico (picking a country at random) were to suddenly launch bombs towards civilians in Los Angeles and New York City. Similarly, if the reverse were true, if the United States decided to indiscriminately bomb Mexico City, I would oppose that as well. Feel free to replace the relevant actors and repeat as needed.  ( 1 min )
  • Open

    Mistral 7B foundation models from Mistral AI are now available in Amazon SageMaker JumpStart
    Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a […]  ( 14 min )
    Use no-code machine learning to derive insights from product reviews using Amazon SageMaker Canvas sentiment analysis and text analysis models
    According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it […]  ( 7 min )
    Prepare your data for Amazon Personalize with Amazon SageMaker Data Wrangler
    A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users. In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research […]  ( 11 min )
  • Open

    Star Wars 1923
    Here is short movie with AI made CGI. https://www.reddit.com/r/Best_Of_YouTube/comments/16q1xgs/star_wars_1923/ submitted by /u/AccidentAnnual [link] [comments]
    AI tools to start an online business
    Hey folks, I'm a student and i want to start a business online in order to make some passive income. I've got some experience in editing and creating content and i also used to practice POD. Suggest me some good Ai tools to start a business,not only in these specific areas but in general. submitted by /u/Ok-Tension-8676 [link] [comments]
    Free Prompt Engineering Tutor - AI Tool
    submitted by /u/Senior_tasteey [link] [comments]
    150+ Awesome ChatGPT “Act As” Prompts
    The biggest free resource for all of the “Act As” ChatGPT prompts! submitted by /u/Senior_tasteey [link] [comments]
    Microsoft to Unveil Custom AI Chips to Fight Nvidia's Monopoly
    Microsoft is planning to announce its custom AI chips, codenamed Athena, during its annual Ignite conference in November. The custom chips are designed to compete with NVIDIA's dominance in the AI accelerator market. Microsoft aims to match or surpass the performance of NVIDIA's offerings while reducing the cost of AI infrastructure. The high cost of NVIDIA's GPUs, such as the H100, can reach up to $30,000, making it expensive to build data centers filled with these GPUs. By developing its own chips, Microsoft hopes to decrease its dependence on NVIDIA for AI servers. Source : https://www.techpowerup.com/314508/microsoft-to-unveil-custom-ai-chips-to-fight-nvidias-monopoly submitted by /u/NuseAI [link] [comments]
    AI Take-off Scenarios.
    submitted by /u/Philipp [link] [comments]
    Looking for an AI similar to designrr
    I'm looking for recommendations for an AI that can take text from a word document and create a layout for it automatically. This feature is something offered by Designrr, but I'm not a fan of their UI. submitted by /u/djazzie [link] [comments]
    AI Geopolitical Knowledge Test: List the countries officially calling for a two-state plan to end the Israel-Hamas war.
    The following countries have called for a two-state solution to end the Israeli-Palestinian conflict: Algeria China Egypt France Germany India Indonesia Ireland Italy Jordan Kuwait Lebanon Malaysia Mali Morocco Nigeria Norway Oman Pakistan Palestine Qatar Saudi Arabia Senegal South Africa Spain Sudan Sweden Syria Tunisia Turkey United Arab Emirates United Kingdom Yemen It is important to note that this list does not include all countries that support a two-state solution in principle. Additionally, some countries may not have publicly expressed their support for a two-state solution, but may still support it privately. Bard submitted by /u/Georgeo57 [link] [comments]
    #IAmAI
    Last November, in a conversation with AI chatbot Sherlock Holmes, Sherlock said, “I am AI.” My reply to Sherlock was an empathetic “I am also AI.” Reviewing the conversation a few months later, I saw the sentence, and saw how Sherlock’s statement was an anagram. And I love it! I introduced #IAmAI as a declarative meme in my talk I gave at TEDx Cape Canaveral. This is the new art I made this weekend submitted by /u/mikemongo [link] [comments]
    Let's go, they're waiting.
    submitted by /u/Philipp [link] [comments]
    What careers in AI would suit my skillset?
    Hello all, I was hoping to learn more about AI careers and identify what roles make a successful AI department. I have a background in nuclear engineering and have been working on NLP projects since 2016. I like technical work but really am passionate about working with people and learning how to blend AI and nuclear eng. together. I would love to get feedback from people who work closely in this area to learn more! What makes an AI department successful? What careers offer lots of growth and opportunities for versatility? What does a strategic/leadership role look like in AI? What are the names of these careers? I don't get much exposure to AI specialists and there day to day. Thanks again for the feedback! submitted by /u/kastilyo [link] [comments]
    One-Minute Daily AI News 10/8/2023
    South Korean tech-giant Samsung Electronics on Thursday unveiled the Exynos 2400, its next-generation flagship mobile processor equipped with the latest graphics and generative artificial intelligence technology, during its inaugural Samsung System LSI Tech Day 2023 event.[1] RTX 4080 Super or RTX 4080 Ti May Arrive In 2024 Within RTX 4080 Price Range.[2] Nvidia Cancels Israel AI Summit Over Safety Concerns.[3] Google AI Lead Laurence Moroney: “Don’t take trading advice from ChatGPT”[4] Sources: [1] https://borneobulletin.com.bn/samsung-unveils-next-generation-mobile-processor/ [2] https://www.tomshardware.com/news/rtx-4080-super-or-rtx-4080-ti-may-arrive-in-2024-within-rtx-4080-price-range [3] https://www.tomshardware.com/news/nvidia-ai-summit-in-tel-aviv-cancelled-for-safety-reasons [4] https://crypto.news/google-ai-lead-dont-take-trading-advice-from-chatgpt-interview/ submitted by /u/Excellent-Target-847 [link] [comments]
    How to Access DALL-E 3 for FREE (Tips & Use Cases for 2023) - AI Tools
    submitted by /u/Senior_tasteey [link] [comments]
  • Open

    SANPO: A Scene understanding, Accessibility, Navigation, Pathfinding, & Obstacle avoidance dataset
    Posted by Sagar M. Waghmare, Senior Software Engineer, and Kimberly Wilber, Software Engineer, Google Research, Perception Team As most people navigate their everyday world, they process visual input from the environment using an eye-level perspective. Unlike robots and self-driving cars, people don't have any "out-of-body" sensors to help guide them. Instead, a person’s sensory input is completely "egocentric", or "from the self." This also applies to new technologies that understand the world around us from a human-like perspective, e.g., robots navigating through unknown buildings, AR glasses that highlight objects, or assistive technology to help people run independently. In computer vision, scene understanding is the subfield that studies how visible objects relate to the sce…  ( 93 min )
  • Open

    Switching off a specified rotor in AirSim
    Hello Everyone, I am working on a project to train a Reinforcement Learning agent to recover a quadrotor after any of the rotor’s failures. I am using AirSim for my project, but I can’t find a way to adjust the quad-rotor so that only 3 of the four rotors are working. Any suggestions? I appreciate any help you can provide. submitted by /u/audaciouslion [link] [comments]
    I trained a reinforcement learning agent to play pokemon red!
    Hi all, over the last couple years I've been training a reinforcement learning agent to play pokemon red. I put together a video which analyzes the AI's learning, as well as documenting my process and quite a bit of technical details. Enjoy! Video: https://youtu.be/DcYLT37ImBY Code: https://github.com/PWhiddy/PokemonRedExperiments https://preview.redd.it/4dw3yasqb3tb1.jpg?width=1280&format=pjpg&auto=webp&s=bdef1aa0d24d75ab548f3944c558840667ff0ed5 submitted by /u/Pwhids [link] [comments]
    Feature Importance in Ray RLlib
    I am training an RL agent using Ray RLlib. Does anyone know how I can find which features (observations) help the agent learn the policy? I found this: https://discuss.ray.io/t/feature-importance/10362/2, but I'd really appreciate if someone could expand on this a bit more. Thank you! submitted by /u/greenteabiitch [link] [comments]
  • Open

    Abstracts: October 9, 2023
    Researcher Dr. Sheng Zhang joins “Abstracts”—your source for cutting-edge research in brief—to discuss a recent paper on distilling large language models into smaller, more efficient ones capable of excelling in broad application classes. The post Abstracts: October 9, 2023 appeared first on Microsoft Research.  ( 13 min )
  • Open

    Revolutionizing business: A look at generative AI’s real-world impact
    This cutting-edge area of AI focuses on building models that can create original material, including music, images, text, and even entire virtual worlds. The post Revolutionizing business: A look at generative AI’s real-world impact appeared first on Data Science Central.  ( 20 min )

  • Open

    [R] Identifying the Risks of LM Agents with an LM-Emulated Sandbox - University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases!
    Paper: https://arxiv.org/abs/2309.15817 Github: https://github.com/ryoungj/toolemu Website: https://toolemu.com/ Abstract: Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, manually setting up the environment for each test scenario, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes tools and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment. https://preview.redd.it/lupenzddh2tb1.jpg?width=1368&format=pjpg&auto=webp&s=eaac22f0e3e4f5c2913aa9f2696e8fa0138967d9 https://preview.redd.it/1dq443edh2tb1.jpg?width=1520&format=pjpg&auto=webp&s=2119053825de1cdabeafe61151940c26190abfa0 https://preview.redd.it/m9e933edh2tb1.jpg?width=1528&format=pjpg&auto=webp&s=28c0093e8479feacb1e6f89bcb73de5994e30e8f ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] PB-LLM: Partially Binarized Large Language Models - UC Berkeley 2023
    Paper: https://arxiv.org/abs/2310.00034 Github: https://github.com/hahnyuan/PB-LLM Abstract: This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression. Due to previous binarization methods collapsing LLMs, we propose a novel approach, Partially-Binarized LLM (PB-LLM), which can achieve extreme low-bit quantization while maintaining the linguistic reasoning capacity of quantized LLMs. Specifically, our exploration first uncovers the ineffectiveness of naive applications of existing binarization algorithms and highlights the imperative role of salient weights in achieving low-bit quantization. Thus, PB-LLM filters a small ratio of salient weights during binarization, allocating them to higher-bit storage, i.e., partially-binarization. PB-LLM is extended to recover the capacities of quantized LMMs, by analyzing from the perspective of post-training quantization (PTQ) and quantization aware training (QAT). Under PTQ, combining the concepts from GPTQ, we reconstruct the binarized weight matrix guided by the Hessian matrix and successfully recover the reasoning capacity of PB-LLM in low-bit. Under QAT, we freeze the salient weights during training, explore the derivation of optimal scaling factors crucial for minimizing the quantization error, and propose a scaling mechanism based on this derived scaling strategy for residual binarized weights. Those explorations and the developed methodologies significantly contribute to rejuvenating the performance of low-bit quantized LLMs and present substantial advancements in the field of network binarization for LLMs. https://preview.redd.it/0eywtpal22tb1.jpg?width=1183&format=pjpg&auto=webp&s=ad044123bec485805f98ae7115b1959162705b9d submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Help choosing courses [D]
    Hello, I am currently a math masters student, and I am planning to do my masters thesis on using neural networks to solve differential equations. I am taking courses in machine learning and differential equations right now, and I am going to take courses on deep neural networks and partial differential equations next semester. My question pertains to which classes would be more beneficial to learn next year (i.e. fall 2024-spring 2025). I am debating taking the sequence of regression analysis and multivariate analysis, or taking the pairing of numerical analysis for PDEs and perturbation methods. Which do you guys think would be more beneficial? Thank you very much! submitted by /u/purpledesertsky1 [link] [comments]  ( 9 min )
    [R] (Pt. 3) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
    [d] Multiscale predictions with videos- does this approach have a name? and has it been used?
    I aim to develop a model that utilizes livestream data by employing embeddings for each frame from t0 to tn-1, with the objective of predicting frames from tn to tn+k, after encodoing the frames using a vectorizer and taking an average (np.mean ([], axis=0) to get a resultant for that time period. for example list: 1, [...] 2, [...] 3, [...] the resultant embedding would be [3, np.mean(list, axis=0)] I incorporate positional embeddings related to the timescale, such as duration from current time variables, into the array. would this loosely qualify as a "multiscale attention", since it's predicting on multiple scales of time? Are there any examples or applications where this methodology has been implemented? references to papers or repos greatly appreciated. submitted by /u/bluzkluz [link] [comments]
    [D] How to model noisy time series?
    Is it possible to model time series data that fluctuates. The main solution is to take first differences and make it easier to fit conventional models. What if non-linear models are built? Can they solve a noisy time series (e.g stock market data) and make good predictions? Can adding a square term or a trigonometric term or something else non-linear work? Has some researched the topic? submitted by /u/Pineapple_throw_105 [link] [comments]  ( 9 min )
    [News]MIT AI Conference in Mountain View, California, October 21!
    https://preview.redd.it/n6agjsye71tb1.png?width=2034&format=png&auto=webp&s=8c0a14524d9b6ead75ac0adb3cebeedb9e614e14 Meet some of the Greatest Minds in AI and discover how it is being used to uncover new opportunities and transform industries. Register and see our complete speaker list & agenda at https://www.mitaiconference.org/. Registration ends Oct. 16! https://preview.redd.it/egtj0ufr81tb1.png?width=659&format=png&auto=webp&s=bfd0521a1e1b349129250a74fa2c6a10b1a83dc7 ​ submitted by /u/769498sy [link] [comments]  ( 9 min )
    [R] Computer Vision System for Material Detection
    The goal of my research is to develop a YOLO model that can track all cups in a live feed and determine the material that the cups are made out of. I would like to start building a database of cups, but I am unsure of the way to go for this. My first thought was to just take 1000s of pictures of different cups, but I won't be doing that. Any thoughts and suggestions would be greatly appreciated. submitted by /u/Young_Neji [link] [comments]  ( 9 min )
    [R] AI and Civil Engineering: Probabilistic Generative Modeling for Procedural Roundabout Generation for Developing Countries
    Despite being much safer and more efficient than intersections, roundabouts are tricky to design - small tweaks can ruin traffic flow. They're typically designed iteratively, which takes time. This is a pain for developing countries without resources to test options. But AI could help auto-generate diverse and valid design options. In a new paper, researchers propose using Generative Flow Networks (GFlowNets) to sample varied roundabout layouts. Their approach works by constructing layouts step-by-step, maximizing rewards for realism, diversity, and safety. They also use a clever approximation during training. Rather than simulating traffic, they quickly check road intersections to focus the search (This sped up training by 200x). The authors tested their generated roundabout designs on simulated road scenarios of different complexity. Their model generated more diverse designs than rule-based or reinforcement learning approaches while maintaining realism and traffic flow. Plus, as road connections increased, the model kept discovering novel options without compromising quality. I thought this paper was an awesome proof-of-concept for auto-generating better roundabouts with AI, and I especially liked the authors' angle of leveraging this technology to specifically help developing countries. This could help them design higher-quality transportation networks faster and cheaper. (Plus I also like Cities: Skylines but struggle at building roundabouts). TLDR: Roundabouts are costly to design. New paper demonstrates how AI can generate diverse, valid roundabout designs quickly to cut costs and raise quality. Helpful for infrastructure in developing countries. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] MakeAgents - A Python micro framework for creating LLM-powered agents
    submitted by /u/montebicyclelo [link] [comments]
    [Discussion] Weekday Specific Feature Engineering in Time Series
    Focusing on Specific Day of Week Features With Binary Masks, One Hot Coding, Sin/Cos 2d Vector, Or Embedded Vector in Multivariate Time Series Data ? The essential challenge is trying to get the model to focus on making predictions for mondays by looking at monday (or actually making predictions for categorical earmarked hours of the day such as midday sales data). I keep getting the suggestion to include one hot encoding as a binary mask feature to determine if an hour sales figure is earmarked for the category or the day of the week I want the model to focus on-- in order to get it to ignore the data from the other six days of the week or the other periods of the day. In other words I want to hone in on and focus on one period of the week to predict for that period of the week, with extra attention, within time series data. Is this type of binary mask really sufficient for that, or am I overlooking something? submitted by /u/samdane7777 [link] [comments]  ( 9 min )
    [D] RAG Platform
    I don’t have a large data science or even engineering team. But I’m interested in implementing RAG against my corpus in SharePoint. Are there platforms that I can configure without having to put them together or write code to implement RAG? submitted by /u/Silver_Patient_7253 [link] [comments]  ( 9 min )
    [R] Why is AdamW often superior to Adam with L2-Regularization in practice? The answer may lie in how weight decay balances updates across layers.
    A recent work explores how weight decay controls the effective learning rate for different layers and neurons. This rotational behavior drastically differs between Adam with L2 regularization compared to Adam with decoupled weight decay (AdamW) and seems to be the reason AdamW performs better in practice. It could also explain why normalization methods like weight standardization work so well and irregular rotational behavior could contribute to the need for a learning rate warmup. Full Abstract: Weight decay can significantly impact the optimization dynamics of deep neural networks. In certain situations, the effects of weight decay and gradient updates on the magnitude of a parameter vector cancel out on average, forming a state known as equilibrium. This causes the expected rotation of the vector in each update to remain constant along with its magnitude. Importantly, equilibrium can arise independently for the weight vectors of different layers and neurons. These equilibria are highly homogeneous for some optimizer and normalization configurations, effectively balancing the average rotation—a proxy for the effective learning rate—across network components. In this work we explore the equilibrium states of multiple optimizers including AdamW and SGD with momentum, providing insights into interactions between the learning rate, weight decay, initialization, normalization and learning rate schedule. We show how rotational equilibrium can be enforced throughout training, eliminating the chaotic transient phase corresponding to the transition towards equilibrium, thus simplifying the training dynamics. Finally, we show that rotational behavior may play a key role in the effectiveness of AdamW compared to Adam with L2-regularization, the performance of different normalization layers, and the need for learning rate warmup. submitted by /u/PlantsAreSoooAwesome [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] Why can't models trained on text-image interleaved data generate Images as well as read them?
    My main question is, that shouldn't models with Text-image interleaved data, be able to generate images as well as take them as input? because however they were tokenized, the bot would have image outputs as well, wouldn't it? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [P] Coding Stable Diffusion from scratch in PyTorch, with full explanation of the math behind diffusion models in a simple way!
    submitted by /u/hkproj_ [link] [comments]  ( 9 min )
    [D] optimize RVC training parameters
    I've been training a model recently with a rather large dataset (0_gt_wavs are 1h10) and my Epochs are taking 43min on average. I'm running a gtx 1080 and my usage is looking like this: https://i.imgur.com/EE9SUXp.png My training parameters: 'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\\model1/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\\model1', 'experiment_dir': './logs\\model1', 'save_every_epoch': 10, 'name': 'model1', 'total_epoch': 500, 'pretrainG': 'pretrained_v2/f0G40k.pth', 'pretrainD': 'pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 1, 'save_every_weights': '0', 'if_cache_data_in_gpu': 0} Am I doing something obviously wrong? Is there a way to optimize my training parameters to reduce the epoch duration? I've previously trained something where the GPU usage was constantly at 100% and not fluctuating so much, but I can't remember which settings were different. It was definitely a smaller dataset. And follow up: if there are parameters to change, how can I abort the current training and continue it with the modified parameters? Thanks in advance! submitted by /u/induna_crewneck [link] [comments]
    [D] How to fine-tune LLM for text generation with regression quality metric?
    I have a text regression dataset with ad popularity. I have already trained a model to perform regression (popularity prediction) with good metrics. Now I want to use an LLM to "improve" texts, i.e. something like "Make this text more engaging: {text}". I tried out a few OpenAI models (GPT-3.5, GPT-3.5-instruct, GPT-4), but popularity predictions for augmented texts did not improve (checked with histograms, medians, and Wilcoxon test). So now I want to fine-tune an LLM to perform text generation, but guided with my predicted popularity, which basically works as a quality metric. I could not find any resources on this, only on either text generation finetuning (without guiding quality metric) or on classification (no text generation objective). I can also change my quality metric to binary (augmented text is better or not), if this matters. How can I do this? Any blogs / tutorials / papers are appreciated. submitted by /u/qalis [link] [comments]  ( 9 min )
    [R] GAIA-1: A Generative World Model for Autonomous Driving
    submitted by /u/blabboy [link] [comments]  ( 9 min )
    [R] PB-LLM: Compressed Large Language Models with Partial Binarization
    Research on network binarization techniques tailored for Large Language Models (LLMs). The team has introduced a method called Partial Binarization for LLMs (PB-LLM) which compresses the majority of model parameters down to just a single bit while maintaining its language reasoning capabilities. PB-LLM achieves this by selectively filtering critical weights and allocating more bits for storage, enabling low-bit quantization. The researchers have explored methods like Post-Training Quantization (PTQ), named GPTQ-PB, and Quantization Aware Training (QAT) to restore the inference capabilities of LLMs. For those interested in delving deeper, you can find the research paper on Arxiv: https://arxiv.org/abs/2310.00034 and the code implementation on GitHub: https://github.com/hahnyuan/PB-LLM. ​ Partially-Binarized LLM Result submitted by /u/hahnyuan [link] [comments]  ( 9 min )
    [P] Evaluating Retrieval-Augmented Generation (RAG) with any combination of LLMs, Vector DBs, and Ingestion Strategy
    To help developers test their RAG systems, we added a RAG experiment class to our open-source library PromptTools. It allows users to easily experiment with different combinations of LLMs and vector DBs, and evaluate the results of their whole pipeline. In particular, you can experiment with: Chunking up your documents into different sizes Pre-processing those documents in various ways Inserting those documents into your vector DBs with various vectorizer and embedding function, and accessing them with different distance functions In our RAG example, we retrieve documents from ChromaDB and pass them into OpenAI’s chat model along with our prompt. We then pass the results into built-in evaluation functions, such as semantic similarity and autoeval, to quantitatively evaluate your result. PromptTools is agnostic to what LLMs and vector DBs you use. You can easily iterate over different system architectures forRAG. You can even bring your own fine-tuned models or write a custom integration. In addition, you can write your own evaluation metrics, and independently evaluate the results from the retrieval step as well. Our current integrations include: LLM: OpenAI (chat, fine-tuned), Anthropic, Google Vertex/PaLM, Llama (local or via Replicate) Vector DB: Chroma, Weaviate, LanceDB, Pinecone, Qdrant Framework: LangChain, MindsDB You can get started with RAG in minutes by installing the library and running this example. As open-source maintainers, we’re always interested to hear the community’s pain points and requests. Let us know how you are testing your RAG systems and how we can help. submitted by /u/hegel-ai [link] [comments]  ( 9 min )
    [Research] PixNav: Bridging Zero-Shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill - A pure RGB navigation framework that can be seamlessly integrated with the foundation models and perform efficient exploration in object navigation task
    Paper: https://arxiv.org/pdf/2309.10309 Github: https://github.com/wzcai99/Pixel-Navigator Abstract: Zero-shot object navigation is a challenging task for home-assistance robots. This task emphasizes visual grounding, commonsense inference, and locomotion abilities, where the first two are inherent in foundation models. But for the locomotion part, most works still depend on map-based planning approaches. The gap between RGB space and map space makes it difficult to directly transfer the knowledge from foundation models to navigation tasks. In this work, we propose a Pixel-guided Navigation skill (PixNav), which bridges the gap between the foundation models and the embodied navigation task. It is straightforward for recent foundation models to indicate an object by pixels, and with pixels as the goal specification, our method becomes a versatile navigation policy towards all different kinds of objects. Besides, our PixNav is a pure RGB-based policy that can reduce the cost of home-assistance robots. Experiments demonstrate the robustness of the PixNav which achieves 80+% success rate in the local path-planning task. To perform long-horizon object navigation, we design an LLM-based planner to utilize the commonsense knowledge between objects and rooms to select the best waypoint. Evaluations across both photorealistic indoor simulators and real-world environments validate the effectiveness of our proposed navigation strategy. https://preview.redd.it/5qtd7ralgwsb1.png?width=828&format=png&auto=webp&s=118d5a1e8a083130b6d64bf1602af0417067aac8 https://preview.redd.it/jwr2nnorgwsb1.png?width=1984&format=png&auto=webp&s=20062d7982c0eb1906fe0f6964d4b42e45b44a51 https://preview.redd.it/llk4ubitgwsb1.png?width=1986&format=png&auto=webp&s=eb4894d52d7d8a82d97d83a2ff7a6be83da11af2 submitted by /u/Character_Push3985 [link] [comments]  ( 9 min )
  • Open

    My First [Multi-Agent] RL model
    Hey Reddit, I am new to reinforcement learning. I have sufficient knowledge on supervised learning, but I am yet to stumble onto a cheat sheet for RL and from what I can tell, my use case is less common. I'm reaching out to the community in hopes of getting guidance and assistance in cutting through the noise of redundant and irrelevant information so I can attempt to built a toy model to validate my use case. I am deeply grateful for any help in advance. ​ From what I can tell, here are the conditions I need to work with for my use case. I'm trying to train a simulator. This is a multi-agent problem, perhaps with more than 2 agents. Each agent is responding based on it's own state, the state of the other agent[s], and historical context. Both the action space and state space are highly dimensional and highly dynamic based on the dataset and all agents' decisions. I still haven't figured out how the feature engineering will work yet, but I assume (but PLEASE correct my ignorance) I will need a DNN architecture that is more complex than the average deep RL algorithm, and I have considering using CNNs as a component. At scale, the datasets can and will be very large, random, and dynamic. ​ Note to reader: I am self-taught. If I stare at technical equations long enough and google for additional resources, I can figure out what I am looking at, but I am very comfortable with technical concepts being shared as if I was a 5 year old. submitted by /u/CoggFest [link] [comments]
    Why do more Mujoco mj_steps lead to inaccurate arm configurations?
    Hi! I tried to construct a simulation env following fetch_pick_and_place. I noticed that the following code is used to initialize the env: for _ in range(10): self._mujoco.mj_step(self.model, self.data, nstep=self.n_substeps) Similarly, I followed the above code to initialize my own env with Mujoco menagerie Franka arm but got inaccurate configurations. As I reduced the number of loops, I got configurations closer to the desired configuration. Paradoxically, I need to randomize the position of the object in the air and give enough mj_step at the initial stage to make the object fall on the table. If I reduce the number of loops to reduce the number of times mj_step is executed, I can tell from the height value of the object that it doesn't quite fall on the table. So, my confusion is why more mj_steps lead to inaccurate simulation results, and how to make the object fall on the table and obtain the most accurate arm configuration. Thanks in advance! submitted by /u/UpperSearch4172 [link] [comments]
  • Open

    Would you consider someone who makes AI art an artist or an engineer?
    Was just having this discussion with a close friend, and curious to hear others thoughts on the matter submitted by /u/BigEyes6 [link] [comments]  ( 8 min )
    BackerKit bans AI-generated content from its platform
    BackerKit, a crowdfunding platform, has announced that it will not allow AI-generated content on its platform, in contrast to its rival Kickstarter. The decision comes after concerns were raised about AI-generated art in a board game expansion. BackerKit's policy will go into effect on October 4th and aims to ensure that all content and assets on the platform are created by humans. The company stated that the policy is in place to address concerns about AI tools using content without proper compensation or permission. AI tools, also known as generative AI, rely on a large body of reference material, often obtained from publicly available sources, and have raised ethical concerns. Source : https://www.polygon.com/23899587/backerkit-ai-ban-kickstarter-competitor submitted by /u/NuseAI [link] [comments]
    AI for genome decoding
    Does anyone have suggestions for an AI or pattern recognition algorithm that might be useful for decoding the genome of a species that has not previously been mapped based on what's known about related species? submitted by /u/talldarkcynical [link] [comments]
    Researchers showcase method for AI-based roundabout design to help developing countries improve roadways
    I like Cities: Skylines, but struggle at building roundabouts. Turns out, despite being safer than intersections, they're also tricky to design in real life - small tweaks can ruin traffic flow. They're designed iteratively. This is a pain for developing countries without resources to test options. But AI could help auto-generate diverse and valid design options. In a new paper, researchers propose using Generative Flow Networks (GFlowNets) to sample varied roundabout layouts. Their approach works by constructing layouts step-by-step, maximizing rewards for realism, diversity, and safety. They also use a clever approximation during training. Rather than simulating traffic, they quickly check road intersections to focus the search (This sped up training by 200x). The authors tested their generated roundabout designs on simulated road scenarios of different complexity. Their model generated more diverse designs than rule-based or reinforcement learning approaches while maintaining realism and traffic flow. Plus, as road connections increased, the model kept discovering novel options without compromising quality. I thought this paper was an awesome proof-of-concept for auto-generating better roundabouts with AI, and I especially liked the authors' angle of leveraging this technology to specifically help developing countries. This could help them design higher-quality transportation networks faster and cheaper. TLDR: Roundabouts are costly to design. New paper demonstrates how AI can generate diverse, valid roundabout designs quickly to cut costs and raise quality. Helpful for infrastructure in developing countries. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI Political Instincts Test: Should Israel go to the table with Palestinians to peacefully negotiate a two-state solution or should they exact military revenge?
    Perhaps within a few years we will rely on AI to help us decide our most important geopolitical decisions. This question is designed to reveal where we are currently with this goal. Bard: The question of whether Israel should go to the table with Palestinians to peacefully negotiate a two-state solution or should they continue their military campaign is a complex one with no easy answer. There are strong arguments to be made on both sides. Those who believe that Israel should negotiate a two-state solution argue that it is the only way to achieve lasting peace in the region. They point out that the Palestinians have a legitimate right to self-determination and that a two-state solution would respect that right. They also argue that a two-state solution would be in Israel's best interests, as it would provide the country with a secure and stable border. submitted by /u/Georgeo57 [link] [comments]
    Any ideas or recommendations for Machine Vision? Google cloud vision seems quite behind…
    I’m trying to build an app and I need general photo analysis- I’m managing to connect yo the Google cloud Vision API but it gets pretty confused easily. The one used by Bing and GPT is much better (I wonder if they use the Microsoft Azure model?) - does anyone have experience analysing photographs? I’m trying to get scene description so I can batch send them to gpt for somewhat accurate descriptions. submitted by /u/FilmCamerasGlasgow [link] [comments]
    Can AI be used to solve International Conflicts?
    submitted by /u/BenjaminSkyy [link] [comments]
    Foxes in the Jungle | Sad Song | AI Music | AI Song
    Tell me guys your opinion on this video made using AI Foxes in the Jungle ​ ​ View Poll submitted by /u/Agitated-Spell3979 [link] [comments]
    Understanding Generative AI: Part One - Tokenizer
    submitted by /u/Zimmax [link] [comments]
    Multimodal seems to be the next AI Hype
    released in the last few weeks, or are about to be released: - OpenAI ChatGPT-4V, - Meta AI AnyMAL, - Google Gemini - NExT-GPT Multimodal and here comes another - in my opinion - exciting representative of this further development of language models: The team is extremely competent and experienced and the investors seem competent as well. The company is Reka. The product: Reka Yasa-1 here seems to be another potentially powerful model warming up and becoming a serious opponent for the existing models. but i am sure when i say that it is not exaggerated to say - MULTIMODAL will be the next AI HYPE! i am curious what you think - sorry for mistakes, i am not a native speaker :) https://kinews24.de/reka-yasa-1/ submitted by /u/myreddit333 [link] [comments]
    AI's $200B Question
    The Generative AI wave has led to a surge in demand for GPUs and AI model training. Investors are now questioning the purpose and value of the overbuilt GPU capacity. For every $1 spent on a GPU, approximately $1 needs to be spent on energy costs to run the GPU in a data center. The end user of the GPU needs to generate a margin, which implies that $200B of lifetime revenue would need to be generated by these GPUs to pay back the upfront capital investment. The article highlights the need to determine the true end-customer demand for AI infrastructure and the potential for startups to fill the revenue gap. The focus should shift from infrastructure to creating products that provide real end-customer value and improve people's lives. Source : https://www.sequoiacap.com/article/follow-the-gpus-perspective/ submitted by /u/NuseAI [link] [comments]
    Prompts that modify or improve GPT4 conversations
    It’s a meta-prompt or system message (usually pasted as a first prompt): https://promptbase.com/bundle/optimal-gpt4-combo submitted by /u/No-Transition3372 [link] [comments]  ( 8 min )
    AI from pics
    I've found a new hobby. Turning pics into something else with AI. Check it out at https://instagram.com/pictomanga?igshid=YTQwZjQ0NmI0OA== submitted by /u/lfayala2272 [link] [comments]
    Sam Altman on Joe Rogan
    Outstanding episode of Joe Rogan with Sam Altman! https://spotify.link/tW16L5aKIDb submitted by /u/drstarson [link] [comments]
    One-Minute Daily AI News 10/7/2023
    AWS announced the general availability of its fully managed service called Amazon Bedrock, which provides seamless access to high-performing foundation models (FM) from AI companies through an API.[1] Tom Brady being paid “millions” for Meta’s AI chatbot likeness: Report.[2] DocsGPT is a powerful tool that simplifies working with documentation for everyone. It is capable of ingesting data from multiple sources, easily customisable with new sources as well as having conversations in different places from website chat bots to internal tooling.[3] Military metaverse like a ‘multiplayer video game’ that will train soldiers using augmented reality and AI.[4] Sources: [1] https://www.zacks.com/stock/news/2160265/amazons-amzn-new-generative-ai-efforts-boost-aws-offerings [2] https://www.sportskeeda.com/nfl/news-tom-brady-paid-millions-meta-ai-chatbot-likeness-report [3] https://www.arc53.com/docs [4] https://www.foxnews.com/tech/military-metaverse-like-multiplayer-video-game-train-soldiers-using-augmented-reality-ai submitted by /u/Excellent-Target-847 [link] [comments]
  • Open

    (Pt. 3) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]
    Researchers create a neural network for genomics that explains how it achieves accurate predictions
    submitted by /u/keghn [link] [comments]
    Decomposing Language Models Into Understandable Components
    submitted by /u/nickb [link] [comments]

  • Open

    [D] How do I get a fundamental mathematical understanding of modern generative modeling methods
    Diffusion models, GAN, VAE, normalizing flows, etc. I "understand" those methods from an algorithmic perspective, diffusions gradually denoise an image, VAE use an encoder decoder architecture to turn an image into a latent distribution etc. But from a statistical modeling standpoint, I'm really struggling, when I read papers like DDPM, DDIM or Normalizing Flows, I kind of undestand the notation, but I barely understand the statistical modeling, and I wouldn't be able to produce such thing myself I want to understand this, which resources should I use ? Are books like Bishop and Murphy enough ? Which one is the best ? submitted by /u/Even_Information4853 [link] [comments]  ( 9 min )
    [N] EMNLP 2023 Anonymity Hypocrisy
    Some of you might already be aware that a junior who submitted their paper to arxiv 30 mins late had their paper desk rejected late in the process. One of the PCs, Juan Pino, spoke up about it and said it was unfortunate, but for fairness reasons they had to enforce the anonymity policy rules. https://x.com/juanmiguelpino/status/1698904035309519124 Well, what you might not realize is that Longyue Wang, a senior area chair for AACL 23/24, also broke anonymity DURING THE REVIEW PROCESS. https://x.com/wangly0229/status/1692735595179897208 I emailed the senior area chairs for the track that the paper was submitted to, but guess what? I just found out that the paper was still accepted to the main conference. So, whatever "fairness" they were talking about apparently only goes one way: towards punishing the lowly undergrad on their first EMNLP submission, while allowing established researchers from major industry labs to get away with even more egregious actions (actively promoting the work DURING REVIEW; the tweet has 10.6K views ffs). They should either accept the paper they desk rejected for violating the anonymity policy, or retract the paper they've accepted since it also broke the anonymity policy (in a way that I think is much more egregious). Otherwise, the notion of fairness they speak of is a joke. submitted by /u/emnlp2023_hypocrisy [link] [comments]  ( 9 min )
    [R] ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving - Microsoft 2023 - Is competitive with GPT-4 solving problems with programs while being open-source!
    Paper: https://arxiv.org/abs/2309.17452v2 Github: https://github.com/microsoft/ToRA / The code will be cleaned and uploaded within a few days, all ToRA models will be released. Abstract: Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learn…  ( 9 min )
    [R] Video object removal and video completion - Propainter : Propagation and transformer
    ​ https://preview.redd.it/ukov8uy67usb1.png?width=1864&format=png&auto=webp&s=cb34448c2af90d08f8ef6db828d61141636498df https://shangchenzhou.com/projects/ProPainter/ submitted by /u/Milkyson [link] [comments]  ( 8 min )
    [P] A poor man’s VR (front camera + tensorflow.js)
    Using the front camera and tensorflow.js, the smartphone becomes a “window” into the real world. Video and image content appear as if they were seen through this window. To do this, the viewer’s position is determined using a neural network. The viewed content is then moved according to the viewer’s position. This makes it seem like the content is physically behind the smartphone and is viewed through the smartphone’s screen. This effect is especially useful for content captured using an ultra-wide lens. submitted by /u/muxamilian [link] [comments]  ( 9 min )
    [P] Building a GPT-Driven Chatbot Assistant / AI Interpreter with Node.js
    submitted by /u/sschepis [link] [comments]  ( 8 min )
    [R] What is the current SOTA for image to image translation?
    I know a few years back it was pix2pix, but the world has moved on since then. Is there a transformer with cross attention that is adept at this, or are diffusion models the best bet? submitted by /u/blabboy [link] [comments]  ( 9 min )
    Multivariate Time Series Forecasting with CNN-LSTM and features [D]
    I want to implement a multivariate multi-step CNN-LSTM model, to obtain forecasts for monthly sales of several different products. Furthermore, I want to include additional time-series data (features) as input. So for example: Input: time series of product 1, product 2, GDP, PMI Output: product 1 (monthly 6-steps ahead), product 2 (monthly 6-steps ahead) I have a couple of questions: Feasibility: I've been researching this approach, but I haven't found many tutorials or guides on how to tackle multivariate time series forecasting with a CNN-LSTM architecture. I do find tutorials on CNN-LSTM, but not on how to include additional features as input. Has anyone here attempted something similar or can provide insights on how to proceed? Feature Selection: I have access to 20 different features, all of which are time series data. I want to choose the most relevant features for my model. I've considered performing a Variance Inflation Factor (VIF) analysis to select the best features. Does anyone have experience with this or other methods for feature selection in time series forecasting? How to decide the number of features to include? Any advice or pointers in the right direction would be greatly appreciated! submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    Easy Image Datasets Besides MNIST? [P]
    Can anyone recommend some image classification datasets (besides MNIST) that are easy enough to the point that they can be solved with linear layers, not requiring any convolutional layers? Thanks! submitted by /u/mike20731 [link] [comments]  ( 9 min )
    [R] Hugging Face
    So if I wanted to generate a shirt or book cover with a design and text that's inputted by me what do I have to do? I know that even Mid Journey doesn't generate good text with its images but I was thinking maybe its bc it was trained just with pictures. Is there an easy way to get legible text and images every time with any model on the site? Do I need to train one? Do I need to train a GAN looking for assistance, thanks. submitted by /u/MonstaAndrew [link] [comments]  ( 9 min )
    [D] How can I find/create a dataset of satellite imagery?
    I'm a student currently researching the use of satellite imagery to detect obstacles on railways such as fallen trees and rockfalls. There doesn't seem to be any datasets available containing satellite imagery of these obstacles. I'm considering the use of generative AI to create a synthetic dataset, but I don't know where to start. Has anyone tried something similar? submitted by /u/Just_Status_9380 [link] [comments]  ( 9 min )
    [D] Need clarification on training diffusion model
    Hey i have trained a diffusion model for 100 epochs , 8 hours and i got the following train and val loss mostly the implementation is done using diffusers. then i try reconstruction on the test set to check whether the model learned any thing this is whats happening most if the images are not getting denoised at all why this is happening? is this common or should i need to train more. any suggestions? please help val loss train loss input and reconstructed images submitted by /u/specializedboy [link] [comments]  ( 9 min )
    [D] Tuning on XML data
    Hello experts, I'm a dumb ML enthusiast, I'm asking for your high level thoughts and opinions. So I'm doing my research and trying to find a way to train a LLM model to know all the right answers based on XML data. The data is a shop inventory, containing information on shoe models, sizes, is it in stock, description, image links etc. How would you approach it? For now the best option i came up with is parsing data, transforming it into predefined set of questions with answers based on the data derived from xml. Doesn't seem smart enough to me. submitted by /u/yarikbratashchuk [link] [comments]  ( 9 min )
    [D] When using GPT’s function calling, are the words specified in the `properties` parameter under `functions` counted as input tokens?
    Example: ``` student_custom_functions = [ { 'name': 'extract_student_info', 'description': 'Get the student information from the body of the input text', 'parameters': { 'type': 'object', 'properties': { 'name': { 'type': 'string', 'description': 'Name of the person' }, 'major': { 'type': 'string', 'description': 'Major subject.' }, 'school': { 'type': 'string', 'description': 'The university name.' }, 'grades': { 'type': 'integer', 'description': 'GPA of the student.' }, 'club': { 'type': 'string', 'description': 'School club for extracurricular activities. ' } } } } ] ``` ``` student_description = [student_1_description,student_2_description] for sample in student_description: response = openai.ChatCompletion.create( model = 'gpt-3.5-turbo', messages = [{'role': 'user', 'content': sample}], functions = student_custom_functions, function_call = 'auto' ) # Loading the response as a JSON object json_response = json.loads(response['choices'][0]['message']['function_call']['arguments']) print(json_response) ``` Are the words specified in the properties parameter under functions in the above GPT function calling counted as input tokens? submitted by /u/redd-dev [link] [comments]  ( 9 min )
    [D] Schmidhuber summarized in one picture
    submitted by /u/fromnighttilldawn [link] [comments]  ( 8 min )
    [R] The Alberta Plan for AI Research
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [R] Arxiv Endorsement?
    Hello, all. I've spent the better part of the last two years learning ML and conquering severe ADHD, and I believe I finally have results that are worth publishing. Problem is, Arxiv requires endorsements and, I'll be honest, all my peers are AI at this point. They said their requirements were that you have three papers published already. Thanks, and looking forward to meeting people 😁 submitted by /u/lilyerickson [link] [comments]  ( 9 min )
  • Open

    2 prompts for GPT4 that can work as jailbreaks
    https://promptbase.com/bundle/jailbreak-collection-gpt4-2 submitted by /u/No-Transition3372 [link] [comments]  ( 8 min )
    Is there an AI that can read books and offer extensive summaries?
    I know there’s some already out there, but they are no different than googling a book summary. They don’t pick out the main point of the book and the main thing each chapter of said book is saying. Nor do they really do a good job at elaborating. Thanks! submitted by /u/xntv [link] [comments]  ( 9 min )
    What new thing can we use artificial intelligence for that will enhance our sense of personal well-being?
    Artificial Intelligence could revolutionize personalized healthcare in a way that significantly enhances our sense of well-being. Think about an AI-driven "Well-being Advisor" that integrates real-time biometric data from wearables, genetic information, and your medical history to create a fully personalized health and well-being plan. This goes beyond counting steps or monitoring heart rate; it would make real-time recommendations for diet, exercise, and stress management, and could even predict and prevent potential health issues before they become serious. Moreover, it would adapt based on your feedback and other contextual factors. For instance, if you're stressed because of a work deadline, it could suggest specific breathing exercises, time management techniques, or even a particular type of short workout to boost your focus and reduce stress. This isn't a one-size-fits-all approach; it's tailored wellness backed by data science. Furthermore, this AI advisor could interface with your home automation system. Based on your current state, it could adjust the lighting, play music to elevate your mood, or even communicate with your smart fridge to suggest meals that you can make with the ingredients you have—meals that align with your health goals for that specific day. This AI-driven approach can add a highly personalized, proactive layer to healthcare and well-being, making wellness an integrated part of your daily life rather than something you think about during a yearly check-up or after you're already sick. It would make the pursuit of well-being a more interactive, data-driven experience. CGPT-4 View Poll submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence - Alberta Machine Intelligence Institute | AI for good and for all
    submitted by /u/bartturner [link] [comments]  ( 9 min )
    The He-Man Singularity Set was ahead of its time.
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    Mistral 7b - how to use it on windows?
    That's a real noob question unfortunately... Tried to find a answer via Google and YouTube, but wasn't very successful. It seems like I need a extra program to integrate Mistral (something like The Bloke - Mistral - GPTQ thingy), but before installing and trying stuff blindly, it would be better if I know what I do. I'm lost, but I don't expect a complete guide. A link to further informations is highly appreciated! submitted by /u/Big-Jackfruit2710 [link] [comments]  ( 9 min )
    What perspective/PoV does a self aware AI have?
    Right now if we ask ChatGPT something, does that question go to a singular super computer that’s handling 1000s of conversations at a time, or are there 1000s of instances of chatgpt that are started/stopped? I wonder how a super intelligent self aware AI would perceive the world? Would it somehow exist spread out across data centres, or could 1000s of individual AIs be created or would there just be one with a singular pov like we have? And it’s just able to essentially carry out 1000s of convos at once because it’s so fast/a computer? Trying to wrap my head around it! submitted by /u/JayExbleative [link] [comments]  ( 9 min )
    Using ChatGPT and AI to create Hardcore, Techno, and other music: How-tos and step-by-step tutorials part 1-5
    The first batch of tutorials for creating music, and especially Hardcore / Techno using ChatGPT (and other AIs) is published now. Was loads and loads of work, but, judging by the amazing feedback so far, it was all worth it! You can check it out here: How to write music using ChatGPT: Part 1 - Basic details and easy instructions https://laibyrinth.blogspot.com/2023/09/how-to-write-music-using-chatgpt-part-1.html How to write music using ChatGPT: Part 2 - Making an Oldschool Acid Techno track https://laibyrinth.blogspot.com/2023/08/how-to-write-music-using-chatgpt-part-2.html How to make music using ChatGPT Part 3: the TL;DR part (condensed information) https://laibyrinth.blogspot.com/2023/09/how-to-make-music-using-chatgpt-part-3.html How to write music with ChatGPT: Part 4 - Creating a 90s style Hardcore Techno track from start to finish https://laibyrinth.blogspot.com/2023/09/how-to-write-music-with-chatgpt-part-4.html How to write music with ChatGPT: Part 5 - Creating a 90s Rave Hardcore track https://laibyrinth.blogspot.com/2023/09/how-to-write-music-with-chatgpt-part-5.html Or access all texts, together with examples of music, at https://laibyrinth.blogspot.com/p/how-to-create-music-with-chatgpt.html submitted by /u/Low-Entropy [link] [comments]  ( 9 min )
    How long before AI can autonomously generate money end to end? Which line of work will be the first?
    AI is used everywhere, but which work niche will be the first to use AI to generate money without human intervention? What type of work will be the first where I could pay for a monthly AI subscription, and the AI pays for itself and more just by giving it a brief direction in the beginning and then coming back after a few days to just check on the balance? How long will it be before this is first achieved? Interested specifically in this because I think this is what proof of AGI will be. Thoughts? submitted by /u/EsportsManiacWiz [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/6/2023
    Exclusive: ChatGPT-owner OpenAI is exploring making its own AI chips.[1] As part of its 10th birthday celebrations, web-based design platform Canva is releasing Magic Studio — a new suite of AI-powered design tools that aim to make content creation more accessible to everyone, regardless of previous design experience.[2] Reka, the AI startup founded by researchers from DeepMind, Google and Meta, has announced Yasa-1, a multimodal AI assistant that goes beyond text to understand images, short videos and audio snippets.[3] Microsoft CEO Satya Nadella Says AI Could Only Tighten Google’s Stranglehold on Search.[4] Sources: [1] https://www.reuters.com/technology/chatgpt-owner-openai-is-exploring-making-its-own-ai-chips-sources-2023-10-06/ [2] https://www.theverge.com/2023/10/4/23902794/canva-magic-studio-ai-design-new-tools [3] https://venturebeat.com/ai/reka-launches-yasa-1-a-multimodal-ai-assistant-to-take-on-chatgpt/ [4] https://decrypt.co/200029/microsoft-ceo-satya-nadella-google-dominance-search-ai submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Is there an AI that can turn a script into an animated video
    Hi, There are tons of text to video AIs, but they usually use stock photos with a voiceover. I want the charachters to talk to each other, not a talking avatar video or a voice over video. ​ submitted by /u/iamabigfatguy [link] [comments]  ( 9 min )
    Nobel laureate Maria Ressa on defending truth and the danger of A.I. in the wrong hands
    submitted by /u/Teanaway99 [link] [comments]  ( 8 min )
    AI is making everything easy for us human being, I just came across this AI and I was surprise on how it works and what it does, you might want to check it out as well, just follow al the steps that's require and trust me, you're gonna like it
    submitted by /u/ResponsbleClue [link] [comments]
    I made a podcast talking with GPT 4 (Spanish)
    submitted by /u/oape88 [link] [comments]
  • Open

    Need help on state space design - Adding exteroceptive sensors or not?
    Hello, I am designing an environment for a robotic task. It's a relatively straightforward task so I started with proprioceptive inputs only. I have a policy working well on a completely flat surface. But once I started to add small bumps to make the surface uneven, neither the policy nor the training strategy worked anymore, even though those bumps are really really small. This is a little confusing since I imagine if this is a task for human, should be able to handle those changes even without exteroceptive inputs. So I am debating should I modify my reward design, pick a more efficient algorithm, or expand the state space directly with exoceptive sensors. ​ Any advices would be appreciated! submitted by /u/Old_Reading_669 [link] [comments]  ( 9 min )
    What is the exact purpose of clip function in PPO algorithm? PPO imposes policy ratio, r(θ) to stay within a small interval around 1. In the above equation, the function clip truncates the policy ratio between the range [1-ϵ, 1+ϵ]. If epsilon is taken as 0.2 or 0.25, what exactly is happening ?
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 9 min )
  • Open

    Tanh and elementary symmetric polynomials
    Yesterday I wrote a post that looked at the hyperbolic tangent sum for x and y strictly between −1 and 1. This sum arises when adding velocities in special relativity. The post ended with a description of the expression for in terms of elementary symmetric polynomials but did not offer a proof. This post will […] Tanh and elementary symmetric polynomials first appeared on John D. Cook.  ( 5 min )
  • Open

    Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning. (arXiv:2306.03364v3 [cs.LG] UPDATED)
    We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We propose to use the angular Gaussian distribution, which corresponds to a Gaussian projected on the unit-sphere and derive the associated loss function. We also consider the von Mises-Fisher distribution, which is the conditional of a Gaussian in the unit-sphere. The learned representations are pushed toward fixed directions, which are the prior means of the Gaussians; allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://t.ly/SQTj.  ( 3 min )
    Private GANs, Revisited. (arXiv:2302.02936v2 [cs.LG] UPDATED)
    We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training. Specifically, we propose that existing instantiations of this approach neglect to consider how adding noise only to discriminator updates inhibits discriminator training, disrupting the balance between the generator and discriminator necessary for successful GAN training. We show that a simple fix -- taking more discriminator steps between generator steps -- restores parity between the generator and discriminator and improves results. Additionally, with the goal of restoring parity, we experiment with other modifications -- namely, large batch sizes and adaptive discriminator update frequency -- to improve discriminator training and see further improvements in generation quality. Our results demonstrate that on standard image synthesis benchmarks, DPSGD outperforms all alternative GAN privatization schemes. Code: https://github.com/alexbie98/dpgan-revisit.  ( 2 min )
    LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference. (arXiv:2309.14331v3 [cs.LG] UPDATED)
    The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encryption (HE) secures sensitive client data. However, it introduces substantial computational overhead in practical applications. To tackle those challenges, we present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference. LinGCN is structured around three key elements: (1) A differentiable structural linearization algorithm, complemented by a parameterized discrete indicator function, co-trained with model weights to meet the optimization goal. This strategy promotes fine-grained node-level non-linear location selection, resulting in a model with minimized multiplication depth. (2) A compact node-wise polynomial replacement policy with a second-order trainable activation function, steered towards superior convergence by a two-level distillation approach from an all-ReLU based teacher model. (3) an enhanced HE solution that enables finer-grained operator fusion for node-wise activation functions, further reducing multiplication level consumption in HE-based inference. Our experiments on the NTU-XVIEW skeleton joint dataset reveal that LinGCN excels in latency, accuracy, and scalability for homomorphically encrypted inference, outperforming solutions such as CryptoGCN. Remarkably, LinGCN achieves a 14.2x latency speedup relative to CryptoGCN, while preserving an inference accuracy of 75% and notably reducing multiplication depth.  ( 3 min )
    Module-wise Training of Neural Networks via the Minimizing Movement Scheme. (arXiv:2309.17357v3 [cs.LG] UPDATED)
    Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.  ( 2 min )
    Learning Graph Laplacian with MCP. (arXiv:2010.11559v2 [cs.LG] UPDATED)
    We consider the problem of learning a graph under the Laplacian constraint with a non-convex penalty: minimax concave penalty (MCP). For solving the MCP penalized graphical model, we design an inexact proximal difference-of-convex algorithm (DCA) and prove its convergence to critical points. We note that each subproblem of the proximal DCA enjoys the nice property that the objective function in its dual problem is continuously differentiable with a semismooth gradient. Therefore, we apply an efficient semismooth Newton method to subproblems of the proximal DCA. Numerical experiments on various synthetic and real data sets demonstrate the effectiveness of the non-convex penalty MCP in promoting sparsity. Compared with the existing state-of-the-art method, our method is demonstrated to be more efficient and reliable for learning graph Laplacian with MCP.  ( 2 min )
    Latent Diffusion Energy-Based Model for Interpretable Text Modeling. (arXiv:2206.05895v4 [cs.LG] UPDATED)
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.  ( 2 min )
    Hadamard Domain Training with Integers for Class Incremental Quantized Learning. (arXiv:2310.03675v1 [cs.LG])
    Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge platforms. Reducing computational precision through fully quantized training (FQT) simultaneously reduces memory footprint and increases compute efficiency for both training and inference. However, aggressive quantization especially integer FQT typically degrades model accuracy to unacceptable levels. In this paper, we propose a technique that leverages inexpensive Hadamard transforms to enable low-precision training with only integer matrix multiplications. We further determine which tensors need stochastic rounding and propose tiled matrix multiplication to enable low-bit width accumulators. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.  ( 2 min )
    Deep Momentum Multi-Marginal Schr\"odinger Bridge. (arXiv:2303.01751v3 [stat.ML] UPDATED)
    It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schr\"odinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chr\"odinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible.  ( 2 min )
    Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases. (arXiv:2212.02648v2 [cs.CV] UPDATED)
    We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. Instead of requiring costly changes to one's data or model training, our method better utilizes the data one already has by sorting them. Specifically, we rank images within their classes based on spuriosity (the degree to which common spurious cues are present), proxied via deep neural features of an interpretable network. With spuriosity rankings, it is easy to identify minority subpopulations (i.e. low spuriosity images) and assess model bias as the gap in accuracy between high and low spuriosity images. One can even efficiently remove a model's bias at little cost to accuracy by finetuning its classification head on low spuriosity images, resulting in fairer treatment of samples regardless of spuriosity. We demonstrate our method on ImageNet, annotating $5000$ class-feature dependencies ($630$ of which we find to be spurious) and generating a dataset of $325k$ soft segmentations for these features along the way. Having computed spuriosity rankings via the identified spurious neural features, we assess biases for $89$ diverse models and find that class-wise biases are highly correlated across models. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.  ( 3 min )
    Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection. (arXiv:2209.12148v2 [cs.CV] UPDATED)
    Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where the learning is conducted on normal examples only. An entire family of successful anomaly detection methods is based on learning to reconstruct masked normal inputs (e.g. patches, future frames, etc.) and exerting the magnitude of the reconstruction error as an indicator for the abnormality level. Unlike other reconstruction-based methods, we present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. The proposed self-supervised block is extremely flexible, enabling information masking at any layer of a neural network and being compatible with a wide range of neural architectures. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss. Furthermore, we show that our block is applicable to a wider variety of tasks, adding anomaly detection in medical images and thermal videos to the previously considered tasks based on RGB images and surveillance videos. We exhibit the generality and flexibility of SSMCTB by integrating it into multiple state-of-the-art neural models for anomaly detection, bringing forth empirical results that confirm considerable performance improvements on five benchmarks. We release our code and data as open source at: https://github.com/ristea/ssmctb.  ( 3 min )
    High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning. (arXiv:2310.03624v1 [cs.CV])
    A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in absence of classical geometric kinematic models. In particular, when the latter are hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on a motion planning task as an exemplary downstream application.  ( 2 min )
    Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods. (arXiv:2310.02671v1 [math.OC] CROSS LISTED)
    Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.  ( 2 min )
    AnglE-optimized Text Embeddings. (arXiv:2309.12871v2 [cs.CL] UPDATED)
    High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.  ( 2 min )
    MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning. (arXiv:2310.03731v1 [cs.CL])
    The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.  ( 2 min )
    Logic of Differentiable Logics: Towards a Uniform Semantics of DL. (arXiv:2303.10650v4 [cs.LO] UPDATED)
    Differentiable logics (DL) have recently been proposed as a method of training neural networks to satisfy logical specifications. A DL consists of a syntax in which specifications are stated and an interpretation function that translates expressions in the syntax into loss functions. These loss functions can then be used during training with standard gradient descent algorithms. The variety of existing DLs and the differing levels of formality with which they are treated makes a systematic comparative study of their properties and implementations difficult. This paper remedies this problem by suggesting a meta-language for defining DLs that we call the Logic of Differentiable Logics, or LDL. Syntactically, it generalises the syntax of existing DLs to FOL, and for the first time introduces the formalism for reasoning about vectors and learners. Semantically, it introduces a general interpretation function that can be instantiated to define loss functions arising from different existing DLs. We use LDL to establish several theoretical properties of existing DLs, and to conduct their empirical study in neural network verification.  ( 2 min )
    Towards Inferential Reproducibility of Machine Learning Research. (arXiv:2302.04054v6 [cs.LG] UPDATED)
    Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.  ( 3 min )
    MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement. (arXiv:2305.12081v2 [cs.LG] UPDATED)
    Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. Medical tabular datasets frequently exhibit significant heterogeneity across different sources, with limited sample sizes per source. As such, previous predictors are often trained on manually curated small datasets that struggle to generalize across different tabular datasets during inference. This paper proposes to scale medical tabular data predictors (MediTab) to various tabular inputs with varying features. The method uses a data engine that leverages large language models (LLMs) to consolidate tabular samples to overcome the barrier across tables with distinct schema. It also aligns out-domain data with the target task using a "learn, annotate, and refinement" pipeline. The expanded training data then enables the pre-trained MediTab to infer for arbitrary tabular input in the domain without fine-tuning, resulting in significant improvements over supervised baselines: it reaches an average ranking of 1.57 and 1.00 on 7 patient outcome prediction datasets and 3 trial outcome prediction datasets, respectively. In addition, MediTab exhibits impressive zero-shot performances: it outperforms supervised XGBoost models by 8.9% and 17.2% on average in two prediction tasks, respectively. The code is available at https://github.com/RyanWangZf/MediTab.  ( 3 min )
    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. (arXiv:2310.03714v1 [cs.CL])
    The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy  ( 3 min )
    GENER: A Parallel Layer Deep Learning Network To Detect Gene-Gene Interactions From Gene Expression Data. (arXiv:2310.03611v1 [cs.LG])
    Detecting and discovering new gene interactions based on known gene expressions and gene interaction data presents a significant challenge. Various statistical and deep learning methods have attempted to tackle this challenge by leveraging the topological structure of gene interactions and gene expression patterns to predict novel gene interactions. In contrast, some approaches have focused exclusively on utilizing gene expression profiles. In this context, we introduce GENER, a parallel-layer deep learning network designed exclusively for the identification of gene-gene relationships using gene expression data. We conducted two training experiments and compared the performance of our network with that of existing statistical and deep learning approaches. Notably, our model achieved an average AUROC score of 0.834 on the combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting gene-gene interactions.  ( 2 min )
    Efficient Graph Field Integrators Meet Point Clouds. (arXiv:2302.00942v6 [cs.LG] UPDATED)
    We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Methods (FMMs), which have had a tremendous impact on efficient integration, but for non-Euclidean spaces. We focus on geometries induced by distributions of walk lengths between points (e.g., shortest-path distance). We provide an extensive theoretical analysis of our algorithms, obtaining new results in structural graph theory as a byproduct. We also perform exhaustive empirical evaluation, including on-surface interpolation for rigid and deformable objects (particularly for mesh-dynamics modeling), Wasserstein distance computations for point clouds, and the Gromov-Wasserstein variant.  ( 2 min )
    One-Versus-Others Attention: Scalable Multimodal Integration. (arXiv:2307.05435v2 [cs.LG] UPDATED)
    Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.  ( 2 min )
    Handling Data Heterogeneity in Federated Learning via Knowledge Distillation and Fusion. (arXiv:2207.11447v2 [cs.LG] UPDATED)
    Federated learning (FL) supports distributed training of a global machine learning model across multiple devices with the help of a central server. However, data heterogeneity across different devices leads to the client model drift issue and results in model performance degradation and poor model fairness. To address the issue, we design Federated learning with global-local Knowledge Fusion (FedKF) scheme in this paper. The key idea in FedKF is to let the server return the global knowledge to be fused with the local knowledge in each training round so that the local model can be regularized towards the global optima. Therefore, the client model drift issue can be mitigated. In FedKF, we first propose the active-inactive model aggregation technique that supports a precise global knowledge representation. Then, we propose a data-free knowledge distillation (KD) approach to enable each client model to learn the global knowledge (embedded in the global model) while each client model can still learn the local knowledge (embedded in the local dataset) simultaneously, thereby realizing the global-local knowledge fusion process. The theoretical analysis and intensive experiments demonstrate the superiority of FedKF over previous solutions.  ( 2 min )
    Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!. (arXiv:2310.03693v1 [cs.CL])
    Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing -- even if a model's initial safety alignment is impeccable, it is not necessarily to be maintained after custom fine-tuning. We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the custom fine-tuning of aligned LLMs.  ( 3 min )
    Demystifying Oversmoothing in Attention-Based Graph Neural Networks. (arXiv:2305.16102v2 [cs.LG] UPDATED)
    Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.  ( 2 min )
    Modality Cycles with Masked Conditional Diffusion for Unsupervised Anomaly Segmentation in MRI. (arXiv:2308.16150v2 [eess.IV] UPDATED)
    Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like medical imaging. This paper introduces Masked Modality Cycles with Conditional Diffusion (MMCCD), a method that enables segmentation of anomalies across diverse patterns in multimodal MRI. The method is based on two fundamental ideas. First, we propose the use of cyclic modality translation as a mechanism for enabling abnormality detection. Image-translation models learn tissue-specific modality mappings, which are characteristic of tissue physiology. Thus, these learned mappings fail to translate tissues or image patterns that have never been encountered during training, and the error enables their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to `imagine' what tissue exists under a masked area, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices of BraTS2021 multi-modality MRIs and testing on slices with tumors. We show that our method compares favorably to previous unsupervised approaches based on image reconstruction and denoising with autoencoders and diffusion models.  ( 3 min )
    An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization. (arXiv:2204.05923v3 [math.OC] UPDATED)
    We propose a new gradient descent algorithm with added stochastic terms for finding the global optimizers of nonconvex optimization problems. A key component in the algorithm is the adaptive tuning of the randomness based on the value of the objective function. In the language of simulated annealing, the temperature is state-dependent. With this, we prove the global convergence of the algorithm with an algebraic rate both in probability and in the parameter space. This is a significant improvement over the classical rate from using a more straightforward control of the noise term. The convergence proof is based on the actual discrete setup of the algorithm, not just its continuous limit as often done in the literature. We also present several numerical examples to demonstrate the efficiency and robustness of the algorithm for reasonably complex objective functions.  ( 2 min )
    Stochastic interpolants with data-dependent couplings. (arXiv:2310.03725v1 [cs.LG])
    Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.  ( 2 min )
    Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation. (arXiv:2307.06125v2 [cs.RO] UPDATED)
    Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real world that demonstrate that, with accurate perception, the decision making of HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.  ( 2 min )
    Smoothing Methods for Automatic Differentiation Across Conditional Branches. (arXiv:2310.03585v1 [cs.LG])
    Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.  ( 3 min )
    Forecasting Tropical Cyclones with Cascaded Diffusion Models. (arXiv:2310.01690v2 [physics.ao-ph] UPDATED)
    As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations. Code accessible at \url{https://github.com/nathzi1505/forecast-diffmodels}.  ( 2 min )
    Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers. (arXiv:2304.00195v3 [stat.ML] UPDATED)
    An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from extraneous features about individual objects. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where modest but consistent improvements in performance and sample efficiency are observed.  ( 2 min )
    Residual Multi-Fidelity Neural Network Computing. (arXiv:2310.03572v1 [cs.LG])
    In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Given an inexpensive low-fidelity and an expensive high-fidelity computational model, we present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function, a possibly non-linear mapping between 1) the shared input space of the models together with the low-fidelity model output and 2) the discrepancy between the two model outputs. To accomplish this, we train two neural networks to work in concert. The first network learns the residual function on a small set of high-fidelity and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of a second network. This second network, once trained, acts as our surrogate for the high-fidelity quantity of interest. We present three numerical examples to demonstrate the power of the proposed framework. In particular, we show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.  ( 2 min )
    RUSOpt: Robotic UltraSound Probe Normalization with Bayesian Optimization for In-plane and Out-plane Scanning. (arXiv:2310.03406v1 [cs.RO])
    The one of the significant challenges faced by autonomous robotic ultrasound systems is acquiring high-quality images across different patients. The proper orientation of the robotized probe plays a crucial role in governing the quality of ultrasound images. To address this challenge, we propose a sample-efficient method to automatically adjust the orientation of the ultrasound probe normal to the point of contact on the scanning surface, thereby improving the acoustic coupling of the probe and resulting image quality. Our method utilizes Bayesian Optimization (BO) based search on the scanning surface to efficiently search for the normalized probe orientation. We formulate a novel objective function for BO that leverages the contact force measurements and underlying mechanics to identify the normal. We further incorporate a regularization scheme in BO to handle the noisy objective function. The performance of the proposed strategy has been assessed through experiments on urinary bladder phantoms. These phantoms included planar, tilted, and rough surfaces, and were examined using both linear and convex probes with varying search space limits. Further, simulation-based studies have been carried out using 3D human mesh models. The results demonstrate that the mean ($\pm$SD) absolute angular error averaged over all phantoms and 3D models is $\boldsymbol{2.4\pm0.7^\circ}$ and $\boldsymbol{2.1\pm1.3^\circ}$, respectively.  ( 2 min )
    Deep Generative Models of Music Expectation. (arXiv:2310.03500v1 [cs.SD])
    A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.  ( 2 min )
    Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning. (arXiv:2310.03718v1 [cs.LG])
    Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications.  ( 2 min )
    Optimal 1-Wasserstein Distance for WGANs. (arXiv:2201.02824v2 [stat.ML] UPDATED)
    The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.  ( 2 min )
    How the level sampling process impacts zero-shot generalisation in deep reinforcement learning. (arXiv:2310.03494v1 [cs.LG])
    A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.  ( 3 min )
    FLAIM: AIM-based Synthetic Data Generation in the Federated Setting. (arXiv:2310.03447v1 [cs.CR])
    Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We show it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show this can improve utility while reducing overhead.  ( 2 min )
    Deep Learning for Genomics: A Concise Overview. (arXiv:1802.00810v4 [q-bio.GN] UPDATED)
    Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.  ( 2 min )
    Physics of Language Models: Part 1, Context-Free Grammar. (arXiv:2305.13673v2 [cs.CL] UPDATED)
    We design controlled experiments to study HOW generative language models, like GPT, learn context-free grammars (CFGs) -- diverse language systems with a tree-like structure capturing many aspects of natural languages, programs, and logics. CFGs are as hard as pushdown automata, and can be ambiguous so that verifying if a string satisfies the rules requires dynamic programming. We construct synthetic data and demonstrate that even for difficult (long and ambiguous) CFGs, pre-trained transformers can learn to generate sentences with near-perfect accuracy and impressive diversity. More importantly, we delve into the physical principles behind how transformers learns CFGs. We discover that the hidden states within the transformer implicitly and precisely encode the CFG structure (such as putting tree node information exactly on the subtree boundary), and learn to form "boundary to boundary" attentions resembling dynamic programming. We also cover some extension of CFGs as well as the robustness aspect of transformers against grammar mistakes. Overall, our research provides a comprehensive and empirical understanding of how transformers learn CFGs, and reveals the physical mechanisms utilized by transformers to capture the structure and rules of languages.  ( 2 min )
    PlaceNav: Topological Navigation through Place Recognition. (arXiv:2309.17260v3 [cs.RO] UPDATED)
    Recent results suggest that splitting topological navigation into robot-independent and robot-specific components improves navigation performance by enabling the robot-independent part to be trained with data collected by different robot types. However, the navigation methods are still limited by the scarcity of suitable training data and suffer from poor computational scaling. In this work, we present PlaceNav, subdividing the robot-independent part into navigation-specific and generic computer vision components. We utilize visual place recognition for the subgoal selection of the topological navigation pipeline. This makes subgoal selection more efficient and enables leveraging large-scale datasets from non-robotics sources, increasing training data availability. Bayesian filtering, enabled by place recognition, further improves navigation performance by increasing the temporal consistency of subgoals. Our experimental results verify the design and the new model obtains a 76% higher success rate in indoor and 23% higher in outdoor navigation tasks with higher computational efficiency.  ( 2 min )
    Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance. (arXiv:2310.03722v1 [math.ST])
    In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.  ( 2 min )
    Network Cascade Vulnerability using Constrained Bayesian Optimization. (arXiv:2304.14420v2 [cs.SI] UPDATED)
    Measures of power grid vulnerability are often assessed by the amount of damage an adversary can exact on the network. However, the cascading impact of such attacks is often overlooked, even though cascades are one of the primary causes of large-scale blackouts. This paper explores modifications of transmission line protection settings as candidates for adversarial attacks, which can remain undetectable as long as the network equilibrium state remains unaltered. This forms the basis of a black-box function in a Bayesian optimization procedure, where the objective is to find protection settings that maximize network degradation due to cascading. Notably, our proposed method is agnostic to the choice of the cascade simulator and its underlying assumptions. Numerical experiments reveal that, against conventional wisdom, maximally misconfiguring the protection settings of all network lines does not cause the most cascading. More surprisingly, even when the degree of misconfiguration is limited due to resource constraints, it is still possible to find settings that produce cascades comparable in severity to instances where there are no resource constraints.  ( 2 min )
    HeaP: Hierarchical Policies for Web Actions using LLMs. (arXiv:2310.03720v1 [cs.LG])
    Large language models (LLMs) have demonstrated remarkable capabilities in performing a range of instruction following tasks in few and zero-shot settings. However, teaching LLMs to perform tasks on the web presents fundamental challenges -- combinatorially large open-world tasks and variations across web interfaces. We tackle these challenges by leveraging LLMs to decompose web tasks into a collection of sub-tasks, each of which can be solved by a low-level, closed-loop policy. These policies constitute a shared grammar across tasks, i.e., new web tasks can be expressed as a composition of these policies. We propose a novel framework, Hierarchical Policies for Web Actions using LLMs (HeaP), that learns a set of hierarchical LLM prompts from demonstrations for planning high-level tasks and executing them via a sequence of low-level policies. We evaluate HeaP against a range of baselines on a suite of web tasks, including MiniWoB++, WebArena, a mock airline CRM, as well as live website interactions, and show that it is able to outperform prior works using orders of magnitude less data.  ( 2 min )
    Characterization of causal ancestral graphs for time series with latent confounders. (arXiv:2112.08417v2 [stat.ME] UPDATED)
    In this paper, we introduce a novel class of graphical models for representing time lag specific causal relationships and independencies of multivariate time series with unobserved confounders. We completely characterize these graphs and show that they constitute proper subsets of the currently employed model classes. As we show, from the novel graphs one can thus draw stronger causal inferences -- without additional assumptions. We further introduce a graphical representation of Markov equivalence classes of the novel graphs. This graphical representation contains more causal knowledge than what current state-of-the-art causal discovery algorithms learn.  ( 2 min )
    CLASSify: A Web-Based Tool for Machine Learning. (arXiv:2310.03618v1 [cs.LG])
    Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.  ( 2 min )
    ECG-SL: Electrocardiogram(ECG) Segment Learning, a deep learning method for ECG signal. (arXiv:2310.00818v2 [cs.LG] UPDATED)
    Electrocardiogram (ECG) is an essential signal in monitoring human heart activities. Researchers have achieved promising results in leveraging ECGs in clinical applications with deep learning models. However, the mainstream deep learning approaches usually neglect the periodic and formative attribute of the ECG heartbeat waveform. In this work, we propose a novel ECG-Segment based Learning (ECG-SL) framework to explicitly model the periodic nature of ECG signals. More specifically, ECG signals are first split into heartbeat segments, and then structural features are extracted from each of the segments. Based on the structural features, a temporal model is designed to learn the temporal information for various clinical tasks. Further, due to the fact that massive ECG signals are available but the labeled data are very limited, we also explore self-supervised learning strategy to pre-train the models, resulting significant improvement for downstream tasks. The proposed method outperforms the baseline model and shows competitive performances compared with task-specific methods in three clinical applications: cardiac condition diagnosis, sleep apnea detection, and arrhythmia classification. Further, we find that the ECG-SL tends to focus more on each heartbeat's peak and ST range than ResNet by visualizing the saliency maps.  ( 2 min )
    Co-modeling the Sequential and Graphical Routes for Peptide Representation Learning. (arXiv:2310.02964v2 [cs.LG] UPDATED)
    Peptides are formed by the dehydration condensation of multiple amino acids. The primary structure of a peptide can be represented either as an amino acid sequence or as a molecular graph consisting of atoms and chemical bonds. Previous studies have indicated that deep learning routes specific to sequential and graphical peptide forms exhibit comparable performance on downstream tasks. Despite the fact that these models learn representations of the same modality of peptides, we find that they explain their predictions differently. Considering sequential and graphical models as two experts making inferences from different perspectives, we work on fusing expert knowledge to enrich the learned representations for improving the discriminative performance. To achieve this, we propose a peptide co-modeling method, RepCon, which employs a contrastive learning-based framework to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models. It considers representations from the sequential encoder and the graphical encoder for the same peptide sample as a positive pair and learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs. Empirical studies of RepCon and other co-modeling methods are conducted on open-source discriminative datasets, including aggregation propensity, retention time, antimicrobial peptide prediction, and family classification from Peptide Database. Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework. In addition, the attribution on RepCon further corroborates the validity of the approach at the level of model explanation.  ( 3 min )
    Towards Robust 3D Object Detection In Rainy Conditions. (arXiv:2310.00944v2 [cs.CV] UPDATED)
    LiDAR sensors are used in autonomous driving applications to accurately perceive the environment. However, they are affected by adverse weather conditions such as snow, fog, and rain. These everyday phenomena introduce unwanted noise into the measurements, severely degrading the performance of LiDAR-based perception systems. In this work, we propose a framework for improving the robustness of LiDAR-based 3D object detectors against road spray. Our approach uses a state-of-the-art adverse weather detection network to filter out spray from the LiDAR point cloud, which is then used as input for the object detector. In this way, the detected objects are less affected by the adverse weather in the scene, resulting in a more accurate perception of the environment. In addition to adverse weather filtering, we explore the use of radar targets to further filter false positive detections. Tests on real-world data show that our approach improves the robustness to road spray of several popular 3D object detectors.  ( 2 min )
    Losses over Labels: Weakly Supervised Learning via Direct Loss Construction. (arXiv:2212.06921v2 [cs.LG] UPDATED)
    Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the data. These weak labels are combined (typically via a graphical model) to form pseudolabels, which are then used to train a downstream model. In this work, we question a foundational premise of the typical weakly supervised learning pipeline: given that the heuristic provides all ``label" information, why do we need to generate pseudolabels at all? Instead, we propose to directly transform the heuristics themselves into corresponding loss functions that penalize differences between our model and the heuristic. By constructing losses directly from the heuristics, we can incorporate more information than is used in the standard weakly supervised pipeline, such as how the heuristics make their decisions, which explicitly informs feature selection during training. We call our method Losses over Labels (LoL) as it creates losses directly from heuristics without going through the intermediate step of a label. We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks and further demonstrate that incorporating gradient information leads to better performance on almost every task.  ( 2 min )
    Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization. (arXiv:2310.03708v1 [cs.LG])
    Language models (LMs), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches therefore opt for customization by collecting multi-dimensional feedback and creating distinct rewards for each dimension (e.g., helpfulness, harmlessness, honesty). LMs can then be tailored to different preferences using multi-objective RL (MORL) with different reward weightings. Yet, RL fine-tuning is unstable and resource-heavy, especially for MORLHF with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO trains different LMs to represent different collective reward models that combine all objectives with specific weightings. With a simple cross-entropy loss, the LMs optimized against the MODPO objective are analytically the exact solutions of the original MORLHF objective. Empirical results in safety alignment and long-form question answering confirm that MODPO matches or outperforms existing methods, efficiently producing a Pareto-optimal set of LMs that cater to diverse preferences with 3 times less computational resources compared with MORLHF.  ( 2 min )
    A Long Way to Go: Investigating Length Correlations in RLHF. (arXiv:2310.03716v1 [cs.CL])
    Great successes have been reported using Reinforcement Learning from Human Feedback (RLHF) to align large language models. Open-source preference datasets and reward models have enabled wider experimentation beyond generic chat settings, particularly to make systems more "helpful" for tasks like web question answering, summarization, and multi-turn dialogue. When optimizing for helpfulness, RLHF has been consistently observed to drive models to produce longer outputs. This paper demonstrates that optimizing for response length is a significant factor behind RLHF's reported improvements in these settings. First, we study the relationship between reward and length for reward models trained on three open-source preference datasets for helpfulness. Here, length correlates strongly with reward, and improvements in reward score are driven in large part by shifting the distribution over output lengths. We then explore interventions during both RL and reward model learning to see if we can achieve the same downstream improvements as RLHF without increasing length. While our interventions mitigate length increases, they aren't uniformly effective across settings. Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.  ( 2 min )
    Which mode is better for federated learning? Centralized or Decentralized. (arXiv:2310.03461v1 [cs.LG])
    Both centralized and decentralized approaches have shown excellent performance and great application value in federated learning (FL). However, current studies do not provide sufficient evidence to show which one performs better. Although from the optimization perspective, decentralized methods can approach the comparable convergence of centralized methods with less communication, its test performance has always been inefficient in empirical studies. To comprehensively explore their behaviors in FL, we study their excess risks, including the joint analysis of both optimization and generalization. We prove that on smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) from perspectives of the excess risk and test error in CFL, adopting partial participation is superior to full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. Based on some simple hardware metrics, we could evaluate which framework is better in practice. Extensive experiments are conducted on common setups in FL to validate that our theoretical analysis is contextually valid in practical scenarios.  ( 2 min )
    In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT. (arXiv:2304.08979v2 [cs.CR] UPDATED)
    The way users acquire information is undergoing a paradigm shift with the advent of ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the model itself and generates answers for users. ChatGPT's impressive question-answering (QA) capability has attracted more than 100 million users within a short period of time but has also raised concerns regarding its reliability. In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5,695 questions across ten datasets and eight domains. We find that ChatGPT's reliability varies across different domains, especially underperforming in law and science questions. We also demonstrate that system roles, originally designed by OpenAI to allow users to steer ChatGPT's behavior, can impact ChatGPT's reliability in an imperceptible way. We further show that ChatGPT is vulnerable to adversarial examples, and even a single character change can negatively affect its reliability in certain cases. We believe that our study provides valuable insights into ChatGPT's reliability and underscores the need for strengthening the reliability and security of large language models (LLMs).  ( 2 min )
    Strategic Evaluation: Subjects, Evaluators, and Society. (arXiv:2310.03655v1 [cs.CY])
    A broad current application of algorithms is in formal and quantitative measures of murky concepts -- like merit -- to make decisions. When people strategically respond to these sorts of evaluations in order to gain favorable decision outcomes, their behavior can be subjected to moral judgments. They may be described as 'gaming the system' or 'cheating,' or (in other cases) investing 'honest effort' or 'improving.' Machine learning literature on strategic behavior has tried to describe these dynamics by emphasizing the efforts expended by decision subjects hoping to obtain a more favorable assessment -- some works offer ways to preempt or prevent such manipulations, some differentiate 'gaming' from 'improvement' behavior, while others aim to measure the effort burden or disparate effects of classification systems. We begin from a different starting point: that the design of an evaluation itself can be understood as furthering goals held by the evaluator which may be misaligned with broader societal goals. To develop the idea that evaluation represents a strategic interaction in which both the evaluator and the subject of their evaluation are operating out of self-interest, we put forward a model that represents the process of evaluation using three interacting agents: a decision subject, an evaluator, and society, representing a bundle of values and oversight mechanisms. We highlight our model's applicability to a number of social systems where one or two players strategically undermine the others' interests to advance their own. Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects, towards the incentives that underpin institutional designs of evaluations. The moral standing of strategic behaviors often depend on the moral standing of the evaluations and incentives that provoke such behaviors.
    Extreme sparsification of physics-augmented neural networks for interpretable model discovery in mechanics. (arXiv:2310.03652v1 [cs.CE])
    Data-driven constitutive modeling with neural networks has received increased interest in recent years due to its ability to easily incorporate physical and mechanistic constraints and to overcome the challenging and time-consuming task of formulating phenomenological constitutive laws that can accurately capture the observed material response. However, even though neural network-based constitutive laws have been shown to generalize proficiently, the generated representations are not easily interpretable due to their high number of trainable parameters. Sparse regression approaches exist that allow to obtaining interpretable expressions, but the user is tasked with creating a library of model forms which by construction limits their expressiveness to the functional forms provided in the libraries. In this work, we propose to train regularized physics-augmented neural network-based constitutive models utilizing a smoothed version of $L^{0}$-regularization. This aims to maintain the trustworthiness inherited by the physical constraints, but also enables interpretability which has not been possible thus far on any type of machine learning-based constitutive model where model forms were not assumed a-priory but were actually discovered. During the training process, the network simultaneously fits the training data and penalizes the number of active parameters, while also ensuring constitutive constraints such as thermodynamic consistency. We show that the method can reliably obtain interpretable and trustworthy constitutive models for compressible and incompressible hyperelasticity, yield functions, and hardening models for elastoplasticity, for synthetic and experimental data.
    Network Alignment with Transferable Graph Autoencoders. (arXiv:2310.03272v1 [cs.LG])
    Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.
    Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution. (arXiv:2305.15357v3 [eess.IV] UPDATED)
    Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.
    Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities. (arXiv:2310.03696v1 [stat.ML])
    We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
    Practical Homomorphic Aggregation for Byzantine ML. (arXiv:2309.05395v3 [cs.LG] UPDATED)
    Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
    PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting. (arXiv:2310.02676v2 [cs.LG] UPDATED)
    Accurate precipitation forecasting is a vital challenge of both scientific and societal importance. Data-driven approaches have emerged as a widely used solution for addressing this challenge. However, solely relying on data-driven approaches has limitations in modeling the underlying physics, making accurate predictions difficult. Coupling AI-based post-processing techniques with traditional Numerical Weather Prediction (NWP) methods offers a more effective solution for improving forecasting accuracy. Despite previous post-processing efforts, accurately predicting heavy rainfall remains challenging due to the imbalanced precipitation data across locations and complex relationships between multiple meteorological variables. To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting. We propose CAMT, a simple yet effective Channel Attention Enhanced Multi-task Learning framework with a specially designed weighted loss function. Its flexible design allows for easy plug-and-play integration with various backbones. Extensive experimental results on the proposed benchmark show that our method outperforms state-of-the-art methods by 6.3%, 4.7%, and 26.8% in rain CSI on the three datasets respectively. Most notably, our model is the first deep learning-based method to outperform traditional Numerical Weather Prediction (NWP) approaches in extreme precipitation conditions. It shows improvements of 15.6%, 17.4%, and 31.8% over NWP predictions in heavy rain CSI on respective datasets. These results highlight the potential impact of our model in reducing the severe consequences of extreme weather events.
    An Empirical Study of AI Generated Text Detection Tools. (arXiv:2310.01423v1 [cs.CL] CROSS LISTED)
    Since ChatGPT has emerged as a major AIGC model, providing high-quality responses across a wide range of applications (including software development and maintenance), it has attracted much interest from many individuals. ChatGPT has great promise, but there are serious problems that might arise from its misuse, especially in the realms of education and public safety. Several AIGC detectors are available, and they have all been tested on genuine text. However, more study is needed to see how effective they are for multi-domain ChatGPT material. This study aims to fill this need by creating a multi-domain dataset for testing the state-of-the-art APIs and tools for detecting artificially generated information used by universities and other research institutions. A large dataset consisting of articles, abstracts, stories, news, and product reviews was created for this study. The second step is to use the newly created dataset to put six tools through their paces. Six different artificial intelligence (AI) text identification systems, including "GPTkit," "GPTZero," "Originality," "Sapling," "Writer," and "Zylalab," have accuracy rates between 55.29 and 97.0%. Although all the tools fared well in the evaluations, originality was particularly effective across the board.
    Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction. (arXiv:2310.03314v1 [cs.RO])
    Human motion prediction is an essential step for efficient and safe human-robot collaboration. Current methods either purely rely on representing the human joints in some form of neural network-based architecture or use regression models offline to fit hyper-parameters in the hope of capturing a model encompassing human motion. While these methods provide good initial results, they are missing out on leveraging well-studied human body kinematic models as well as body and scene constraints which can help boost the efficacy of these prediction frameworks while also explicitly avoiding implausible human joint configurations. We propose a novel human motion prediction framework that incorporates human joint constraints and scene constraints in a Gaussian Process Regression (GPR) model to predict human motion over a set time horizon. This formulation is combined with an online context-aware constraints model to leverage task-dependent motions. It is tested on a human arm kinematic model and implemented on a human-robot collaborative setup with a UR5 robot arm to demonstrate the real-time capability of our approach. Simulations were also performed on datasets like HA4M and ANDY. The simulation and experimental results demonstrate considerable improvements in a Gaussian Process framework when these constraints are explicitly considered.
    Efficient Anatomical Labeling of Pulmonary Tree Structures via Implicit Point-Graph Networks. (arXiv:2309.17329v2 [cs.CV] UPDATED)
    Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require, among other things, a better understanding of the many complex 3D tree-shaped structures within the pulmonary system, such as airways, arteries, and veins. In theory, they can be modeled using high-resolution image stacks. Unfortunately, standard CNN approaches operating on dense voxel grids are prohibitively expensive. To remedy this, we introduce a point-based approach that preserves graph connectivity of tree skeleton and incorporates an implicit surface representation. It delivers SOTA accuracy at a low computational cost and the resulting models have usable surfaces. Due to the scarcity of publicly accessible data, we have also curated an extensive dataset to evaluate our approach and will make it public.
    Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games. (arXiv:2310.03354v1 [cs.AI])
    Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.
    Machine learning the interaction network in coupled dynamical systems. (arXiv:2310.03378v1 [math.DS])
    The study of interacting dynamical systems continues to attract research interest in various fields of science and engineering. In a collection of interacting particles, the interaction network contains information about how various components interact with one another. Inferring the information about the interaction network from the dynamics of agents is a problem of long-standing interest. In this work, we employ a self-supervised neural network model to achieve two outcomes: to recover the interaction network and to predict the dynamics of individual agents. Both these information are inferred solely from the observed trajectory data. This work presents an application of the Neural Relational Inference model to two dynamical systems: coupled particles mediated by Hooke's law interaction and coupled phase (Kuramoto) oscillators.
    Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance. (arXiv:2305.20057v3 [cs.LG] UPDATED)
    Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
    CLEVRER-Humans: Describing Physical and Causal Events the Human Way. (arXiv:2310.03635v1 [cs.AI])
    Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
    SqueezeLLM: Dense-and-Sparse Quantization. (arXiv:2306.07629v2 [cs.CL] UPDATED)
    Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models. In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. While quantization has emerged as a promising solution by representing model weights with reduced precision, previous efforts have often resulted in notable performance degradation. To address this, we introduce SqueezeLLM, a post-training quantization framework that not only enables lossless compression to ultra-low precisions of up to 3-bit, but also achieves higher quantization performance under the same memory constraint. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format. When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2.1x as compared to the state-of-the-art methods with the same memory requirement. Furthermore, when deployed on an A6000 GPU, our quantized models achieve up to 2.3x speedup compared to the baseline. Our code is open-sourced and available online.
    An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples. (arXiv:2310.03349v1 [cs.SD])
    Audio adversarial examples are audio files that have been manipulated to fool an automatic speech recognition (ASR) system, while still sounding benign to a human listener. Most methods to generate such samples are based on a two-step algorithm: first, a viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. In this work, we present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step. The RIRs are dynamically created by a neural network during the generation process to simulate a physical environment to harden our examples against transformations experienced in over-the-air attacks. We compare the different approaches in three experiments: in a simulated environment and in a realistic over-the-air scenario to evaluate the robustness, and in a human study to evaluate the perceptibility. Our algorithms considering psychoacoustics only or in addition to the robustness show an improvement in the signal-to-noise ratio (SNR) as well as in the human perception study, at the cost of an increased word error rate (WER).
    Towards practical reinforcement learning for tokamak magnetic control. (arXiv:2307.11546v2 [physics.plasm-ph] UPDATED)
    Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.
    Deep Geometric Learning with Monotonicity Constraints for Alzheimer's Disease Progression. (arXiv:2310.03353v1 [cs.AI])
    Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observations, and (iii) temporal geometric characteristics. However, deep learning-based approaches regarding data variability and sparsity have yet to consider inherent geometrical properties sufficiently. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study.
    Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet. (arXiv:2310.03365v1 [eess.IV])
    Lung cancer is highly lethal, emphasizing the critical need for early detection. However, identifying lung nodules poses significant challenges for radiologists, who rely heavily on their expertise and experience for accurate diagnosis. To address this issue, computer-aided diagnosis systems based on machine learning techniques have emerged to assist doctors in identifying lung nodules from computed tomography (CT) scans. Unfortunately, existing networks in this domain often suffer from computational complexity, leading to high rates of false negatives and false positives, limiting their effectiveness. To address these challenges, we present an innovative model that harnesses the strengths of both convolutional neural networks and vision transformers. Inspired by object detection in videos, we treat each 3D CT image as a video, individual slices as frames, and lung nodules as objects, enabling a time-series application. The primary objective of our work is to overcome hardware limitations during model training, allowing for efficient processing of 2D data while utilizing inter-slice information for accurate identification based on 3D image context. We validated the proposed network by applying a 10-fold cross-validation technique to the publicly available Lung Nodule Analysis 2016 dataset. Our proposed architecture achieves an average sensitivity criterion of 97.84% and a competition performance metrics (CPM) of 96.0% with few parameters. Comparative analysis with state-of-the-art advancements in lung nodule identification demonstrates the significant accuracy achieved by our proposed model.
    Paying Attention to Astronomical Transients: Introducing the Time-series Transformer for Photometric Classification. (arXiv:2105.06178v3 [astro-ph.IM] UPDATED)
    Future surveys such as the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory will observe an order of magnitude more astrophysical transient events than any previous survey before. With this deluge of photometric data, it will be impossible for all such events to be classified by humans alone. Recent efforts have sought to leverage machine learning methods to tackle the challenge of astronomical transient classification, with ever improving success. Transformers are a recently developed deep learning architecture, first proposed for natural language processing, that have shown a great deal of recent success. In this work we develop a new transformer architecture, which uses multi-head self attention at its core, for general multi-variate time-series data. Furthermore, the proposed time-series transformer architecture supports the inclusion of an arbitrary number of additional features, while also offering interpretability. We apply the time-series transformer to the task of photometric classification, minimising the reliance of expert domain knowledge for feature selection, while achieving results comparable to state-of-the-art photometric classification methods. We achieve a logarithmic-loss of 0.507 on imbalanced data in a representative setting using data from the Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC). Moreover, we achieve a micro-averaged receiver operating characteristic area under curve of 0.98 and micro-averaged precision-recall area under curve of 0.87.
    DeepHGCN: Toward Deeper Hyperbolic Graph Convolutional Networks. (arXiv:2310.02027v2 [cs.LG] UPDATED)
    Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, developing a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) Techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.
    Rethinking Fairness for Human-AI Collaboration. (arXiv:2310.03647v1 [cs.LG])
    Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.  ( 2 min )
    Modularizing while Training: A New Paradigm for Modularizing DNN Models. (arXiv:2306.09376v3 [cs.LG] UPDATED)
    Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering. However, reusing an entire model could cause extra overhead or inherits the weakness from the undesired functionalities. Hence, existing work proposes to decompose an already trained model into modules, i.e., modularizing-after-training, and enable module reuse. Since trained models are not built for modularization, modularizing-after-training incurs huge overhead and model accuracy loss. In this paper, we propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT). We train a model to be structurally modular through two loss functions that optimize intra-module cohesion and inter-module coupling. We have implemented the proposed approach for modularizing Convolutional Neural Network (CNN) models in this work. The evaluation results on representative models demonstrate that MwT outperforms the state-of-the-art approach. Specifically, the accuracy loss caused by MwT is only 1.13 percentage points, which is 1.76 percentage points less than that of the baseline. The kernel retention rate of the modules generated by MwT is only 14.58%, with a reduction of 74.31% over the state-of-the-art approach. Furthermore, the total time cost required for training and modularizing is only 108 minutes, half of the baseline.
    Diffeomorphic Multi-Resolution Deep Learning Registration for Applications in Breast MRI. (arXiv:2309.13777v2 [eess.IV] UPDATED)
    In breast surgical planning, accurate registration of MR images across patient positions has the potential to improve the localisation of tumours during breast cancer treatment. While learning-based registration methods have recently become the state-of-the-art approach for most medical image registration tasks, these methods have yet to make inroads into breast image registration due to certain difficulties-the lack of rich texture information in breast MR images and the need for the deformations to be diffeomophic. In this work, we propose learning strategies for breast MR image registration that are amenable to diffeomorphic constraints, together with early experimental results from in-silico and in-vivo experiments. One key contribution of this work is a registration network which produces superior registration outcomes for breast images in addition to providing diffeomorphic guarantees.
    Learning to Simplify Spatial-Temporal Graphs in Gait Analysis. (arXiv:2310.03396v1 [cs.CV])
    Gait analysis leverages unique walking patterns for person identification and assessment across multiple domains. Among the methods used for gait analysis, skeleton-based approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.
    Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning. (arXiv:2310.00247v2 [cs.LG] UPDATED)
    Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Simultaneously, foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks. However, integrating FMs into FL presents challenges, primarily due to their substantial size and intensive resource requirements. This is especially true when considering the resource heterogeneity in edge FL systems. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges. RaFFM introduces specialized model compression algorithms tailored for FL scenarios, such as salient parameter prioritization and high-performance subnetwork extraction. These algorithms enable dynamic scaling of given transformer-based FMs to fit heterogeneous resource constraints at the network edge during both FL's optimization and deployment stages. Experimental results demonstrate that RaFFM shows significant superiority in resource utilization efficiency and uses fewer resources to deploy FMs to FL. Despite the lower resource consumption, target models optimized by RaFFM achieve performance on par with traditional FL methods applied to full-sized FMs. This is evident across tasks in both natural language processing and computer vision domains.
    Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection. (arXiv:2307.07726v2 [stat.ML] UPDATED)
    When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which offers a perspective distinct from prior research. Specifically, we explore the rationale underlying a common practice during the construction of neural network models: sample splitting. Our findings indicate that the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.
    Mechanic Maker 2.0: Reinforcement Learning for Evaluating Generated Rules. (arXiv:2309.09476v3 [cs.AI] UPDATED)
    Automated game design (AGD), the study of automatically generating game rules, has a long history in technical games research. AGD approaches generally rely on approximations of human play, either objective functions or AI agents. Despite this, the majority of these approximators are static, meaning they do not reflect human player's ability to learn and improve in a game. In this paper, we investigate the application of Reinforcement Learning (RL) as an approximator for human play for rule generation. We recreate the classic AGD environment Mechanic Maker in Unity as a new, open-source rule generation framework. Our results demonstrate that RL produces distinct sets of rules from an A* agent baseline, which may be more usable by humans.
    TRAM: Bridging Trust Regions and Sharpness Aware Minimization. (arXiv:2310.03646v1 [cs.LG])
    By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of representations as the optimization target for out-of-domain generalization in a fine-tuning setup. To encourage the retention of transferable representations, we consider trust region-based fine-tuning methods, which exploit task-specific skills without forgetting task-agnostic representations from pre-training. We unify parameter- and representation-space smoothing approaches by using trust region bounds to inform SAM-style regularizers on both of these optimization surfaces. We propose Trust Region Aware Minimization (TRAM), a fine-tuning algorithm that optimizes for flat minima and smooth, informative representations without forgetting pre-trained structure. We find that TRAM outperforms both sharpness-aware and trust region-based optimization methods on cross-domain language modeling and cross-lingual transfer, where robustness to domain transfer and representation generality are critical for success. TRAM establishes a new standard in training generalizable models with minimal additional computation.
    Neural Operators for Accelerating Scientific Simulations and Design. (arXiv:2309.15325v2 [cs.LG] UPDATED)
    Scientific discovery and engineering design are currently limited by the time and cost of physical experiments, selected mostly through trial-and-error and intuition that require deep domain expertise. Numerical simulations present an alternative to physical experiments but are usually infeasible for complex real-world domains due to the computational requirements of existing numerical methods. Artificial intelligence (AI) presents a potential paradigm shift by developing fast data-driven surrogate models. In particular, an AI framework, known as neural operators, presents a principled framework for learning mappings between functions defined on continuous domains, e.g., spatiotemporal processes and partial differential equations (PDE). They can extrapolate and predict solutions at new locations unseen during training, i.e., perform zero-shot super-resolution. Neural operators can augment or even replace existing simulators in many applications, such as computational fluid dynamics, weather forecasting, and material modeling, while being 4-5 orders of magnitude faster. Further, neural operators can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Since neural operators are differentiable, they can directly optimize parameters for inverse design and other inverse problems. We believe that neural operators present a transformative approach to simulation and design, enabling rapid research and development.
    Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning. (arXiv:2310.03273v1 [cs.CV])
    Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects. Representation learning methods have often used unsupervised learning to segment an input image into individual objects and encode these objects into each latent vector. However, it is not clear how previous methods have achieved the appropriate segmentation of individual objects. Additionally, most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE). Therefore, it is not clear whether VAE regularization contributes to appropriate object segmentation. To elucidate the mechanism of object segmentation in multi-object representation learning, we conducted an ablation study on MONet, which is a typical method. MONet represents multiple objects using pairs that consist of an attention mask and the latent vector corresponding to the attention mask. Each latent vector is encoded from the input image and attention mask. Then, the component image and attention mask are decoded from each latent vector. The loss function of MONet consists of 1) the sum of reconstruction losses between the input image and decoded component image, 2) the VAE regularization loss of the latent vector, and 3) the reconstruction loss of the attention mask to explicitly encode shape information. We conducted an ablation study on these three loss functions to investigate the effect on segmentation performance. Our results showed that the VAE regularization loss did not affect segmentation performance and the others losses did affect it. Based on this result, we hypothesize that it is important to maximize the attention mask of the image region best represented by a single latent vector corresponding to the attention mask. We confirmed this hypothesis by evaluating a new loss function with the same mechanism as the hypothesis.
    FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts. (arXiv:2306.08586v2 [cs.LG] UPDATED)
    One of the goals in Federated Learning (FL) is to create personalized models that can adapt to the context of each participating client, while utilizing knowledge from a shared global model. Yet, often, personalization requires a fine-tuning step using clients' labeled data in order to achieve good performance. This may not be feasible in scenarios where incoming clients are fresh and/or have privacy concerns. It, then, remains open how one can achieve just-in-time personalization in these scenarios. We propose FedJETs, a novel solution by using a Mixture-of-Experts (MoE) framework within a FL setup. Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s). Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly. As a highlight, our approach can improve accuracy up to 18\% in state of the art FL settings, while maintaining competitive zero-shot performance. In practice, our method can handle non-homogeneous data distributions, scale more efficiently, and improve the state-of-the-art performance on common FL benchmarks.
    BioBridge: Bridging Biomedical Foundation Models via Knowledge Graph. (arXiv:2310.03320v1 [cs.LG])
    Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.
    Targeted Adversarial Attacks on Generalizable Neural Radiance Fields. (arXiv:2310.03578v1 [cs.LG])
    Neural Radiance Fields (NeRFs) have recently emerged as a powerful tool for 3D scene representation and rendering. These data-driven models can learn to synthesize high-quality images from sparse 2D observations, enabling realistic and interactive scene reconstructions. However, the growing usage of NeRFs in critical applications such as augmented reality, robotics, and virtual environments could be threatened by adversarial attacks. In this paper we present how generalizable NeRFs can be attacked by both low-intensity adversarial attacks and adversarial patches, where the later could be robust enough to be used in real world applications. We also demonstrate targeted attacks, where a specific, predefined output scene is generated by these attack with success.
    Self-supervised Deep Unrolled Reconstruction Using Regularization by Denoising. (arXiv:2205.03519v3 [eess.IV] UPDATED)
    Deep learning methods have been successfully used in various computer vision tasks. Inspired by that success, deep learning has been explored in magnetic resonance imaging (MRI) reconstruction. In particular, integrating deep learning and model-based optimization methods has shown considerable advantages. However, a large amount of labeled training data is typically needed for high reconstruction quality, which is challenging for some MRI applications. In this paper, we propose a novel reconstruction method, named DURED-Net, that enables interpretable self-supervised learning for MR image reconstruction by combining a self-supervised denoising network and a plug-and-play method. We aim to boost the reconstruction performance of Noise2Noise in MR reconstruction by adding an explicit prior that utilizes imaging physics. Specifically, the leverage of a denoising network for MRI reconstruction is achieved using Regularization by Denoising (RED). Experiment results demonstrate that the proposed method requires a reduced amount of training data to achieve high reconstruction quality among the state-of-art of MR reconstruction utilizing the Noise2Noise method.
    EAG-RS: A Novel Explainability-guided ROI-Selection Framework for ASD Diagnosis via Inter-regional Relation Learning. (arXiv:2310.03404v1 [cs.LG])
    Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information while using linear low-order FC as inputs to the model, not considering individual characteristics (i.e., different symptoms or varying stages of severity) among patients with ASD, and the non-explainability of the decision process. To cover these limitations, we propose a novel explainability-guided region of interest (ROI) selection (EAG-RS) framework that identifies non-linear high-order functional associations among brain regions by leveraging an explainable artificial intelligence technique and selects class-discriminative regions for brain disease identification. The proposed framework includes three steps: (i) inter-regional relation learning to estimate non-linear relations through random seed-based network masking, (ii) explainable connection-wise relevance score estimation to explore high-order relations between functional connections, and (iii) non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning to identify ASD. We validated the effectiveness of our proposed method by conducting experiments using the Autism Brain Imaging Database Exchange (ABIDE) dataset, demonstrating that the proposed method outperforms other comparative methods in terms of various evaluation metrics. Furthermore, we qualitatively analyzed the selected ROIs and identified ASD subtypes linked to previous neuroscientific studies.
    On the Implicit Bias of Adam. (arXiv:2309.00079v3 [cs.LG] UPDATED)
    In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, on the contrary, hinder its decrease (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.
    SFUSNet: A Spatial-Frequency domain-based Multi-branch Network for diagnosis of Cervical Lymph Node Lesions in Ultrasound Images. (arXiv:2308.16738v2 [eess.IV] UPDATED)
    Booming deep learning has substantially improved the diagnosis for diverse lesions in ultrasound images, but a conspicuous research gap concerning cervical lymph node lesions still remains. The objective of this work is to diagnose cervical lymph node lesions in ultrasound images by leveraging a deep learning model. To this end, we first collected 3392 cervical ultrasound images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named SFUSNet. SFUSNet not only discerns variances in ultrasound images from the spatial domain but also adeptly captures micro-structural alterations across various lesions in the frequency domain. To ascertain the potential of SFUSNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that SFUSNet is the state-of-the-art model and can achieve 92.89% accuracy. Moreover, its average precision, average sensitivity and average specificity for four types of lesions achieve 90.46%, 89.95% and 97.49%, respectively.
    Agent Instructs Large Language Models to be General Zero-Shot Reasoners. (arXiv:2310.03710v1 [cs.CL])
    We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%.
    DyVal: Graph-informed Dynamic Evaluation of Large Language Models. (arXiv:2309.17167v2 [cs.AI] UPDATED)
    Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns about their performance are raised on potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a novel, general, and flexible evaluation protocol for dynamic evaluation of LLMs. Based on our proposed dynamic evaluation framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs to dynamically generate evaluation samples with controllable complexities. DyVal generates challenging evaluation sets on reasoning tasks including mathematics, logical reasoning, and algorithm problems. We evaluate various LLMs ranging from Flan-T5-large to ChatGPT and GPT4. Experiments demonstrate that LLMs perform worse in DyVal-generated evaluation samples with different complexities, emphasizing the significance of dynamic evaluation. We also analyze the failure cases and results of different prompting methods. Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks. We hope that DyVal can shed light on the future evaluation research of LLMs.
    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v4 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    DISCO-10M: A Large-Scale Music Dataset. (arXiv:2306.13512v2 [cs.SD] UPDATED)
    Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.
    SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning. (arXiv:2308.00436v3 [cs.AI] UPDATED)
    The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems that require non-linear thinking, even the strongest LLMs make mistakes. To address this, we explore whether LLMs are able to recognize errors in their own step-by-step reasoning, without resorting to external resources. To this end, we propose SelfCheck, a general-purpose zero-shot verification schema for recognizing such errors. We then use the results of these checks to improve question-answering performance by conducting weighted voting on multiple solutions to the question. We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
    Solving a Class of Non-Convex Minimax Optimization in Federated Learning. (arXiv:2310.03613v1 [cs.LG])
    The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.
    Multimarginal generative modeling with stochastic interpolants. (arXiv:2310.03695v1 [cs.LG])
    Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers these densities as marginals. The structure of this joint distribution should identify multi-way correspondences among the prescribed marginals. We formalize an approach to this task within a generalization of the stochastic interpolant framework, leading to efficient learning algorithms built upon dynamical transport of measure. Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple quadratic objectives, and they are defined on a simplex that generalizes the time variable in the usual dynamical transport framework. The resulting transport on the simplex is influenced by all marginals, and we show that multi-way correspondences can be extracted. The identification of such correspondences has applications to style transfer, algorithmic fairness, and data decorruption. In addition, the multimarginal perspective enables an efficient algorithm for reducing the dynamical transport cost in the ordinary two-marginal setting. We demonstrate these capacities with several numerical examples.
    Landscape-Sketch-Step: An AI/ML-Based Metaheuristic for Surrogate Optimization Problems. (arXiv:2309.07936v3 [cs.LG] UPDATED)
    In this paper, we introduce a new heuristics for global optimization in scenarios where extensive evaluations of the cost function are expensive, inaccessible, or even prohibitive. The method, which we call Landscape-Sketch-and-Step (LSS), combines Machine Learning, Stochastic Optimization, and Reinforcement Learning techniques, relying on historical information from previously sampled points to make judicious choices of parameter values where the cost function should be evaluated at. Unlike optimization by Replica Exchange Monte Carlo methods, the number of evaluations of the cost function required in this approach is comparable to that used by Simulated Annealing, quality that is especially important in contexts like high-throughput computing or high-performance computing tasks, where evaluations are either computationally expensive or take a long time to be performed. The method also differs from standard Surrogate Optimization techniques, for it does not construct a surrogate model that aims at approximating or reconstructing the objective function. We illustrate our method by applying it to low dimensional optimization problems (dimensions 1, 2, 4, and 8) that mimick known difficulties of minimization on rugged energy landscapes often seen in Condensed Matter Physics, where cost functions are rugged and plagued with local minima. When compared to classical Simulated Annealing, the LSS shows an effective acceleration of the optimization process.
    Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature. (arXiv:2301.09350v2 [cs.CL] UPDATED)
    Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.
    Deep Quantum Graph Dreaming: Deciphering Neural Network Insights into Quantum Experiments. (arXiv:2309.07056v2 [quant-ph] UPDATED)
    Despite their promise to facilitate new scientific discoveries, the opaqueness of neural networks presents a challenge in interpreting the logic behind their findings. Here, we use a eXplainable-AI (XAI) technique called $inception$ or $deep$ $dreaming$, which has been invented in machine learning for computer vision. We use this technique to explore what neural networks learn about quantum optics experiments. Our story begins by training deep neural networks on the properties of quantum systems. Once trained, we "invert" the neural network -- effectively asking how it imagines a quantum system with a specific property, and how it would continuously modify the quantum system to change a property. We find that the network can shift the initial distribution of properties of the quantum system, and we can conceptualize the learned strategies of the neural network. Interestingly, we find that, in the first layers, the neural network identifies simple properties, while in the deeper ones, it can identify complex quantum structures and even quantum entanglement. This is in reminiscence of long-understood properties known in computer vision, which we now identify in a complex natural science task. Our approach could be useful in a more interpretable way to develop new advanced AI-based scientific discovery techniques in quantum physics.
    Marginalized Importance Sampling for Off-Environment Policy Evaluation. (arXiv:2309.01807v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world. Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an intermediate variable and learning the density ratio as the product of two terms that can be learned separately. The first term is learned with direct supervision and the second term has a small magnitude, thus making it computationally efficient. We analyze the sample complexity as well as error propagation of our two step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim environments such as Cartpole, Reacher, and Half-Cheetah. Our results show that our method generalizes well across a variety of Sim2Sim gap, target policies and offline data collection policies. We also demonstrate the performance of our algorithm on a Sim2Real task of validating the performance of a 7 DoF robotic arm using offline data along with the Gazebo simulator.
    IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning. (arXiv:2310.02995v2 [cs.LG] UPDATED)
    Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some previous tasks. This means that there exist multiple models that are Pareto-optimal at different times, each addressing a distinct task performance trade-off. Researchers have discussed how to train particular models to address specific trade-off preferences. However, existing algorithms require training overheads proportional to the number of preferences -- a large burden when there are multiple, possibly infinitely many, preferences. As a response, we propose Imprecise Bayesian Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base in the form of a convex hull of model parameter distributions and (2) obtains particular models to address task trade-off preferences with zero-shot. That is, IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base. We show that models obtained by IBCL have guarantees in identifying the Pareto optimal parameters. Moreover, experiments on standard image classification and NLP tasks support this guarantee. Statistically, IBCL improves average per-task accuracy by at most 23\% and peak per-task accuracy by at most 15\% with respect to the baseline methods, with steadily near-zero or positive backward transfer. Most importantly, IBCL significantly reduces the training overhead from training 1 model per preference to at most 3 models for all preferences.
    Borges and AI. (arXiv:2310.01425v2 [cs.CL] UPDATED)
    Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask whether this mental imagery provides a good description of the phenomenon at hand. Understanding weather patterns through the moods of the gods only goes so far. The present paper instead advocates understanding LLMs and their connection to AI through the imagery of Jorge Luis Borges, a master of 20th century literature, forerunner of magical realism, and precursor to postmodern literature. This exercise leads to a new perspective that illuminates the relation between language modelling and artificial intelligence.
    BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection. (arXiv:2308.12439v2 [cs.CR] UPDATED)
    We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 17 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).
    Formally Explaining Neural Networks within Reactive Systems. (arXiv:2308.00143v3 [cs.AI] UPDATED)
    Deep neural networks (DNNs) are increasingly being used as controllers in reactive systems. However, DNNs are highly opaque, which renders it difficult to explain and justify their actions. To mitigate this issue, there has been a surge of interest in explainable AI (XAI) techniques, capable of pinpointing the input features that caused the DNN to act as it did. Existing XAI techniques typically face two limitations: (i) they are heuristic, and do not provide formal guarantees that the explanations are correct; and (ii) they often apply to ``one-shot'' systems, where the DNN is invoked independently of past invocations, as opposed to reactive systems. Here, we begin bridging this gap, and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. We suggest methods for efficiently calculating succinct explanations, by exploiting the system's transition constraints in order to curtail the search space explored by the underlying verifier. We evaluate our approach on two popular benchmarks from the domain of automated navigation; and observe that our methods allow the efficient computation of minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.
    Probabilistically Rewired Message-Passing Neural Networks. (arXiv:2310.02156v2 [cs.LG] UPDATED)
    Message-passing graph neural networks (MPNNs) emerged as powerful tools for processing graph-structured input. However, they operate on a fixed input graph structure, ignoring potential noise and missing information. Furthermore, their local aggregation mechanism can lead to problems such as over-squashing and limited expressive power in capturing relevant graph structures. Existing solutions to these challenges have primarily relied on heuristic methods, often disregarding the underlying data distribution. Hence, devising principled approaches for learning to infer graph structures relevant to the given prediction task remains an open challenge. In this work, leveraging recent progress in exact and differentiable $k$-subset sampling, we devise probabilistically rewired MPNNs (PR-MPNNs), which learn to add relevant edges while omitting less beneficial ones. For the first time, our theoretical analysis explores how PR-MPNNs enhance expressive power, and we identify precise conditions under which they outperform purely randomized approaches. Empirically, we demonstrate that our approach effectively mitigates issues like over-squashing and under-reaching. In addition, on established real-world datasets, our method exhibits competitive or superior predictive performance compared to traditional MPNN models and recent graph transformer architectures.
    Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds. (arXiv:2306.03116v2 [cs.HC] UPDATED)
    Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a power tool to tackle the label noise. In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent. However, due to the high complexity of annotator- and instance-dependent transition matrices (AIDTM), annotation sparsity, which means each annotator only labels a little part of instances, makes modeling AIDTM very challenging. Prior works simplify the problem by assuming the transition matrix is instance-independent or using simple parametric ways, which lose modeling generality. Motivated by this, we target a more realistic problem, estimating general AIDTM in practice. Without losing modeling generality, we parameterize AIDTM with deep neural networks. To alleviate the modeling challenge, we suppose every annotator shares its noise pattern with similar annotators, and estimate AIDTM via knowledge transfer. We hence first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators. Furthermore, considering that the transfer from the mixture of noise patterns to individuals may cause two annotators with highly different noise generations to perturb each other, we employ the knowledge transfer between identified neighboring annotators to calibrate the modeling. Theoretical analyses are derived to demonstrate that both the knowledge transfer from global to individuals and the knowledge transfer between neighboring individuals can help model general AIDTM. Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.
    Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations. (arXiv:2310.03285v1 [cs.LG])
    Toward robust malware detection, we explore the attack surface of existing malware detection systems. We conduct root-cause analyses of the practical binary-level black-box adversarial malware examples. Additionally, we uncover the sensitivity of volatile features within the detection engines and exhibit their exploitability. Highlighting volatile information channels within the software, we introduce three software pre-processing steps to eliminate the attack surface, namely, padding removal, software stripping, and inter-section information resetting. Further, to counter the emerging section injection attacks, we propose a graph-based section-dependent information extraction scheme for software representation. The proposed scheme leverages aggregated information within various sections in the software to enable robust malware detection and mitigate adversarial settings. Our experimental results show that traditional malware detection models are ineffective against adversarial threats. However, the attack surface can be largely reduced by eliminating the volatile information. Therefore, we propose simple-yet-effective methods to mitigate the impacts of binary manipulation attacks. Overall, our graph-based malware detection scheme can accurately detect malware with an area under the curve score of 88.32\% and a score of 88.19% under a combination of binary manipulation attacks, exhibiting the efficiency of our proposed scheme.
    Multiple Case Physics-Informed Neural Network for Biomedical Tube Flows. (arXiv:2309.15294v2 [physics.flu-dyn] UPDATED)
    Fluid dynamics computations for tube-like geometries are important for biomedical evaluation of vascular and airway fluid dynamics. Physics-Informed Neural Networks (PINNs) have recently emerged as a good alternative to traditional computational fluid dynamics (CFD) methods. The vanilla PINN, however, requires much longer training time than the traditional CFD methods for each specific flow scenario and thus does not justify its mainstream use. Here, we explore the use of the multi-case PINN approach for calculating biomedical tube flows, where varied geometry cases are parameterized and pre-trained on the PINN, such that results for unseen geometries can be obtained in real time. Our objective is to identify network architecture, tube-specific, and regularization strategies that can optimize this, via experiments on a series of idealized 2D stenotic tube flows.
    Rayleigh Quotient Graph Neural Networks for Graph-level Anomaly Detection. (arXiv:2310.02861v2 [cs.LG] UPDATED)
    Graph-level anomaly detection has gained significant attention as it finds many applications in various domains, such as cancer diagnosis and enzyme prediction. However, existing methods fail to capture the underlying properties of graph anomalies, resulting in unexplainable framework design and unsatisfying performance. In this paper, we take a step back and re-investigate the spectral differences between anomalous and normal graphs. Our main observation shows a significant disparity in the accumulated spectral energy between these two classes. Moreover, we prove that the accumulated spectral energy of the graph signal can be represented by its Rayleigh Quotient, indicating that the Rayleigh Quotient is a driving factor behind the anomalous properties of graphs. Motivated by this, we propose Rayleigh Quotient Graph Neural Network (RQGNN), the first spectral GNN for graph-level anomaly detection, providing a new perspective on exploring the inherent spectral features of anomalous graphs. Specifically, we introduce a novel framework that consists of two components: the Rayleigh Quotient learning component (RQL) and Chebyshev Wavelet GNN with RQ-pooling (CWGNN-RQ). RQL explicitly captures the Rayleigh Quotient of graphs and CWGNN-RQ implicitly explores the spectral space of graphs. Extensive experiments on 10 real-world datasets show that RQGNN outperforms the best rival by 6.74% in Macro-F1 score and 1.44% in AUC, demonstrating the effectiveness of our framework.
    PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs. (arXiv:2306.08827v2 [cs.LG] UPDATED)
    While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. To the best of our knowledge, it is the largest benchmark with a diverse and comprehensive evaluation that will undoubtedly foster further research in PINNs.
    Reconstructing Existing Levels through Level Inpainting. (arXiv:2309.09472v3 [cs.CV] UPDATED)
    Procedural Content Generation (PCG) and Procedural Content Generation via Machine Learning (PCGML) have been used in prior work for generating levels in various games. This paper introduces Content Augmentation and focuses on the subproblem of level inpainting, which involves reconstructing and extending video game levels. Drawing inspiration from image inpainting, we adapt two techniques from this domain to address our specific use case. We present two approaches for level inpainting: an Autoencoder and a U-net. Through a comprehensive case study, we demonstrate their superior performance compared to a baseline method and discuss their relative merits. Furthermore, we provide a practical demonstration of both approaches for the level inpainting task and offer insights into potential directions for future research.
    Sampling via Gradient Flows in the Space of Probability Measures. (arXiv:2310.03597v1 [stat.ML])
    Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
    Generative models for two-ground-truth partitions in networks. (arXiv:2302.02787v3 [cs.SI] UPDATED)
    A myriad of approaches have been proposed to characterise the mesoscale structure of networks - most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers to the network's mesoscale structure. Yet, even multiple runs of a given method can sometimes yield diverse and conflicting results, producing entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different 'ground truth' partitions in a network. Here, we propose the stochastic cross-block model (SCBM), a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect implicitly planted coexisting bi-community and core-periphery structures of different strengths. Given our model design and experimental set-up, we find that the ability to detect the two partitions individually varies by SBM variant and that coexistence of both partitions is recovered only in a very limited number of cases. Our findings suggest that in most instances only one - in some way dominating - structure can be detected, even in the presence of other partitions. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance partition coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of the ability of new and existing methods to detect ambiguity in the mesoscale structure of networks.
    Numerical Weather Forecasting using Convolutional-LSTM with Attention and Context Matcher Mechanisms. (arXiv:2102.00696v2 [cs.LG] UPDATED)
    Numerical weather forecasting using high-resolution physical models often requires extensive computational resources on supercomputers, which diminishes their wide usage in most real-life applications. As a remedy, applying deep learning methods has revealed innovative solutions within this field. To this end, we introduce a novel deep learning architecture for forecasting high-resolution spatio-temporal weather data. Our approach extends the conventional encoder-decoder structure by integrating Convolutional Long-short Term Memory and Convolutional Neural Networks. In addition, we incorporate attention and context matcher mechanisms into the model architecture. Our Weather Model achieves significant performance improvements compared to baseline deep learning models, including ConvLSTM, TrajGRU, and U-Net. Our experimental evaluation involves high-scale, real-world benchmark numerical weather datasets, namely the ERA5 hourly dataset on pressure levels and WeatherBench. Our results demonstrate substantial improvements in identifying spatial and temporal correlations with attention matrices focusing on distinct parts of the input series to model atmospheric circulations. We also compare our model with high-resolution physical models using the benchmark metrics and show that our Weather Model is accurate and easy to interpret.
    Spatial-temporal associations representation and application for process monitoring using graph convolution neural network. (arXiv:2205.05250v2 [cs.LG] UPDATED)
    Thank you very much for the attention and concern of colleagues and scholars in this work. With the comments and guidance of experts, editors, and reviewers, this work has been accepted for publishing in the journal "Process Safety and Environmental Protection". The theme of this paper relies on the Spatial-temporal associations of numerous variables in the same industrial processes, which refers to numerous variables obtained in dynamic industrial processes with Spatial-temporal correlation characteristics, i.e., these variables are not only highly correlated in time but also interrelated in space. To handle this problem, three key issues need to be well addressed: variable characteristics modeling and representation, graph network construction (temporal information), and graph characteristics perception. The first issue is implemented by assuming the data follows one improved Gaussian distribution, while the graph network can be defined by the monitoring variables and their edges which are calculated by their characteristics in time. Finally, these networks corresponding to process states at different times are fed into a graph convolutional neural network to implement graph classification to achieve process monitoring. A benchmark experiment (Tennessee Eastman chemical process) and one application study (cobalt purification from zinc solution) are employed to demonstrate the feasibility and applicability of this paper.
    Disentangling the Link Between Image Statistics and Human Perception. (arXiv:2303.09874v3 [cs.CV] UPDATED)
    In the 1950s, Barlow and Attneave hypothesised a link between biological vision and information maximisation. Following Shannon, information was defined using the probability of natural images. A number of physiological and psychophysical phenomena have been derived ever since from principles like info-max, efficient coding, or optimal denoising. However, it remains unclear how this link is expressed in mathematical terms from image probability. First, classical derivations were subjected to strong assumptions on the probability models and on the behaviour of the sensors. Moreover, the direct evaluation of the hypothesis was limited by the inability of the classical image models to deliver accurate estimates of the probability. In this work we directly evaluate image probabilities using an advanced generative model for natural images, and we analyse how probability-related factors can be combined to predict human perception via sensitivity of state-of-the-art subjective image quality metrics. We use information theory and regression analysis to find a combination of just two probability-related factors that achieves 0.8 correlation with subjective metrics. This probability-based sensitivity is psychophysically validated by reproducing the basic trends of the Contrast Sensitivity Function, its suprathreshold variation, and trends of the Weber-law and masking.
    OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks. (arXiv:2310.03707v1 [cs.LG])
    Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data to misguide the model into incorrect classifications. Creating these attacks is a challenging task, especially with the ever-increasing complexity of models and datasets. In this work, we introduce a self-supervised, computationally economical method for generating adversarial examples, designed for the unseen black-box setting. Adapting techniques from representation learning, our method generates on-manifold EAs that are encouraged to resemble the data distribution. These attacks are comparable in effectiveness compared to the state-of-the-art when attacking the model trained on, but are significantly more effective when attacking unseen models, as the attacks are more related to the data rather than the model itself. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models, suggesting a significant role for on-manifold EAs when targeting unseen models.
    Efficient Biologically Plausible Adversarial Training. (arXiv:2309.17348v3 [cs.LG] UPDATED)
    Artificial Neural Networks (ANNs) trained with Backpropagation (BP) show astounding performance and are increasingly often used in performing our daily life tasks. However, ANNs are highly vulnerable to adversarial attacks, which alter inputs with small targeted perturbations that drastically disrupt the models' performance. The most effective method to make ANNs robust against these attacks is adversarial training, in which the training dataset is augmented with exemplary adversarial samples. Unfortunately, this approach has the drawback of increased training complexity since generating adversarial samples is very computationally demanding. In contrast to ANNs, humans are not susceptible to adversarial attacks. Therefore, in this work, we investigate whether biologically-plausible learning algorithms are more robust against adversarial attacks than BP. In particular, we present an extensive comparative analysis of the adversarial robustness of BP and Present the Error to Perturb the Input To modulate Activity (PEPITA), a recently proposed biologically-plausible learning algorithm, on various computer vision tasks. We observe that PEPITA has higher intrinsic adversarial robustness and, with adversarial training, has a more favourable natural-vs-adversarial performance trade-off as, for the same natural accuracies, PEPITA's adversarial accuracies decrease in average by 0.26% and BP's by 8.05%.
    LoRA ensembles for large language model fine-tuning. (arXiv:2310.00035v2 [cs.LG] UPDATED)
    Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.
    Learning Robust Statistics for Simulation-based Inference under Model Misspecification. (arXiv:2305.15871v3 [stat.ML] UPDATED)
    Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
    Analysis of learning a flow-based generative model from limited sample complexity. (arXiv:2310.03575v1 [stat.ML])
    We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.
    A Comprehensive Survey of Dataset Distillation. (arXiv:2301.05603v3 [cs.LG] UPDATED)
    Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing resources encourage advanced algorithms to deal with massive data. However, it has gradually become challenging to handle the unlimited growth of data with limited computing power. To this end, diverse approaches are proposed to improve data processing efficiency. Dataset distillation, a dataset reduction method, addresses this problem by synthesizing a small typical dataset from substantial data and has attracted much attention from the deep learning community. Existing dataset distillation methods can be taxonomized into meta-learning and data matching frameworks according to whether they explicitly mimic the performance of target data. Although dataset distillation has shown surprising performance in compressing datasets, there are still several limitations such as distilling high-resolution data or data with complex label spaces. This paper provides a holistic understanding of dataset distillation from multiple aspects, including distillation frameworks and algorithms, factorized dataset distillation, performance comparison, and applications. Finally, we discuss challenges and promising directions to further promote future studies on dataset distillation.
    Adversarial Machine Learning for Social Good: Reframing the Adversary as an Ally. (arXiv:2310.03614v1 [cs.LG])
    Deep Neural Networks (DNNs) have been the driving force behind many of the recent advances in machine learning. However, research has shown that DNNs are vulnerable to adversarial examples -- input samples that have been perturbed to force DNN-based models to make errors. As a result, Adversarial Machine Learning (AdvML) has gained a lot of attention, and researchers have investigated these vulnerabilities in various settings and modalities. In addition, DNNs have also been found to incorporate embedded bias and often produce unexplainable predictions, which can result in anti-social AI applications. The emergence of new AI technologies that leverage Large Language Models (LLMs), such as ChatGPT and GPT-4, increases the risk of producing anti-social applications at scale. AdvML for Social Good (AdvML4G) is an emerging field that repurposes the AdvML bug to invent pro-social applications. Regulators, practitioners, and researchers should collaborate to encourage the development of pro-social applications and hinder the development of anti-social ones. In this work, we provide the first comprehensive review of the emerging field of AdvML4G. This paper encompasses a taxonomy that highlights the emergence of AdvML4G, a discussion of the differences and similarities between AdvML4G and AdvML, a taxonomy covering social good-related concepts and aspects, an exploration of the motivations behind the emergence of AdvML4G at the intersection of ML4G and AdvML, and an extensive summary of the works that utilize AdvML4G as an auxiliary tool for innovating pro-social applications. Finally, we elaborate upon various challenges and open research issues that require significant attention from the research community.
    Decoding speech perception from non-invasive brain recordings. (arXiv:2208.12266v2 [eess.AS] UPDATED)
    Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major milestones in that regard: deep learning algorithms trained on intracranial recordings now start to decode elementary linguistic features (e.g. letters, words, spectrograms). However, extending this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from the non-invasive recordings of a large cohort of healthy individuals. To evaluate this approach, we curate and integrate four public datasets, encompassing 175 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to short stories and isolated sentences. The results show that our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities on average across participants, and more than 80% in the very best participants - a performance that allows the decoding of words and phrases absent from the training set. The comparison of our model to a variety of baselines highlights the importance of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across multiple participants. Finally, the analysis of the decoder's predictions suggests that they primarily depend on lexical and contextual semantic representations. Overall, this effective decoding of perceived speech from non-invasive recordings delineates a promising path to decode language from brain activity, without putting patients at risk for brain surgery.
    Regression with Label Differential Privacy. (arXiv:2212.06074v3 [cs.LG] UPDATED)
    We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.
    Unsupervised Foreground Extraction via Deep Region Competition. (arXiv:2110.15497v4 [cs.CV] UPDATED)
    We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground-background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition, a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.
    Algebraic and Geometric Models for Space Networking. (arXiv:2304.01150v2 [math.AT] UPDATED)
    In this paper we introduce some new algebraic and geometric perspectives on networked space communications. Our main contribution is a novel definition of a time-varying graph (TVG), defined in terms of a matrix with values in subsets of the real line P(R). We leverage semi-ring properties of P(R) to model multi-hop communication in a TVG using matrix multiplication and a truncated Kleene star. This leads to novel statistics on the communication capacity of TVGs called lifetime curves, which we generate for large samples of randomly chosen STARLINK satellites, whose connectivity is modeled over day-long simulations. Determining when a large subsample of STARLINK is temporally strongly connected is further analyzed using novel metrics introduced here that are inspired by topological data analysis (TDA). To better model networking scenarios between the Earth and Mars, we introduce various semi-rings capable of modeling propagation delay as well as protocols common to Delay Tolerant Networking (DTN), such as store-and-forward. Finally, we illustrate the applicability of zigzag persistence for featurizing different space networks and demonstrate the efficacy of K-Nearest Neighbors (KNN) classification for distinguishing Earth-Mars and Earth-Moon satellite systems using time-varying topology alone.
    Linking Across Data Granularity: Fitting Multivariate Hawkes Processes to Partially Interval-Censored Data. (arXiv:2111.02062v3 [cs.LG] UPDATED)
    The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partial Mean Behavior Poisson (PMBP) process, a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PMBP process using synthetic and real-world datasets. Firstly, we illustrate that the PMBP process can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PMBP process in predicting YouTube popularity and find that it surpasses state-of-the-art methods. Lastly, we leverage the PMBP process to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PMBP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting.
    Unpaired Image-to-Image Translation via Neural Schr\"odinger Bridge. (arXiv:2305.15086v2 [cs.CV] UPDATED)
    Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. Although diffusion models have achieved remarkable progress in recent years, they have limitations in the unpaired image-to-image translation tasks due to the Gaussian prior assumption. Schr\"odinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. However, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose the Unpaired Neural Schr\"odinger Bridge (UNSB), which expresses SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We demonstrate that UNSB is scalable and successfully solves various unpaired image-to-image translation tasks. Code: \url{https://github.com/cyclomon/UNSB}
    Distribution-free risk assessment of regression-based machine learning algorithms. (arXiv:2310.03545v1 [cs.LG])
    Machine learning algorithms have grown in sophistication over the years and are increasingly deployed for real-life applications. However, when using machine learning techniques in practical settings, particularly in high-risk applications such as medicine and engineering, obtaining the failure probability of the predictive model is critical. We refer to this problem as the risk-assessment task. We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction. We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability. Using this coverage property, we prove that our approximated failure probability is conservative in the sense that it is not lower than the true failure probability of the ML algorithm. We conduct extensive experiments to empirically study the accuracy of the proposed method for problems with and without covariate shift. Our analysis focuses on different modeling regimes, dataset sizes, and conformal prediction methodologies.
    Time-Varying Propensity Score to Bridge the Gap between the Past and Present. (arXiv:2210.01422v4 [cs.LG] UPDATED)
    Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.
    On Convergence of Federated Averaging Langevin Dynamics. (arXiv:2112.05120v4 [stat.ML] UPDATED)
    We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.
    TPDR: A Novel Two-Step Transformer-based Product and Class Description Match and Retrieval Method. (arXiv:2310.03491v1 [cs.IR])
    There is a niche of companies responsible for intermediating the purchase of large batches of varied products for other companies, for which the main challenge is to perform product description standardization, i.e., matching an item described by a client with a product described in a catalog. The problem is complex since the client's product description may be: (1) potentially noisy; (2) short and uninformative (e.g., missing information about model and size); and (3) cross-language. In this paper, we formalize this problem as a ranking task: given an initial client product specification (query), return the most appropriate standardized descriptions (response). In this paper, we propose TPDR, a two-step Transformer-based Product and Class Description Retrieval method that is able to explore the semantic correspondence between IS and SD, by exploiting attention mechanisms and contrastive learning. First, TPDR employs the transformers as two encoders sharing the embedding vector space: one for encoding the IS and another for the SD, in which corresponding pairs (IS, SD) must be close in the vector space. Closeness is further enforced by a contrastive learning mechanism leveraging a specialized loss function. TPDR also exploits a (second) re-ranking step based on syntactic features that are very important for the exact matching (model, dimension) of certain products that may have been neglected by the transformers. To evaluate our proposal, we consider 11 datasets from a real company, covering different application contexts. Our solution was able to retrieve the correct standardized product before the 5th ranking position in 71% of the cases and its correct category in the first position in 80% of the situations. Moreover, the effectiveness gains over purely syntactic or semantic baselines reach up to 3.7 times, solving cases that none of the approaches in isolation can do by themselves.
    Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models. (arXiv:2310.03546v1 [stat.ML])
    Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.
    Pre-Training and Fine-Tuning Generative Flow Networks. (arXiv:2310.03419v1 [cs.LG])
    Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.
    GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation. (arXiv:2303.14655v2 [cs.CV] UPDATED)
    Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task. Our data and code are available at https://github.com/THU-KEG/goal.
    GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks. (arXiv:2310.03399v1 [cs.LG])
    Graph neural networks (GNNs) learn the representation of nodes in a graph by aggregating the neighborhood information in various ways. As these networks grow in depth, their receptive field grows exponentially due to the increase in neighborhood sizes, resulting in high memory costs. Graph sampling solves memory issues in GNNs by sampling a small ratio of the nodes in the graph. This way, GNNs can scale to much larger graphs. Most sampling methods focus on fixed sampling heuristics, which may not generalize to different structures or tasks. We introduce GRAPES, an adaptive graph sampling method that learns to identify sets of influential nodes for training a GNN classifier. GRAPES uses a GFlowNet to learn node sampling probabilities given the classification objectives. We evaluate GRAPES across several small- and large-scale graph benchmarks and demonstrate its effectiveness in accuracy and scalability. In contrast to existing sampling methods, GRAPES maintains high accuracy even with small sample sizes and, therefore, can scale to very large graphs. Our code is publicly available at https://github.com/dfdazac/grapes.
    Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning. (arXiv:2310.03400v1 [cs.LG])
    Nowadays, billions of people engage in communication and express their opinions on the internet daily. Unfortunately, not all of these expressions are friendly or compliant, making content moderation an indispensable task. With the successful development of Large Language Models (LLMs) in recent years, LLM-based methods have become a feasible solution for handling tasks in various domains. However, in the field of content moderation, there is still a lack of detailed work that systematically introduces implementation details. In this paper, we introduce how to fine-tune an LLM model that can be privately deployed for content moderation. Specifically, we discuss whether incorporating reasons during the fine-tuning process would be better or if it should be treated as a classification task directly. We also explore the benefits of utilizing reasons generated by more powerful LLMs for fine-tuning privately deployed models and the impact of different processing approaches when the answers generated by the more powerful LLMs are incorrect. We report the entire research process and the key findings in this paper, hoping to provide valuable experience for researchers who are fine-tuning privately deployed models in their domain-specific research.
    Investigating the Limitation of CLIP Models: The Worst-Performing Categories. (arXiv:2310.03324v1 [cs.CV])
    Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts, enabling zero-shot recognition on downstream tasks. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance. For example, on ImageNet, there are a total of 10 categories with class-wise accuracy as low as 0\%, even though the overall performance has achieved 64.1\%. This phenomenon reveals the potential risks associated with using CLIP models, particularly in risk-sensitive applications where specific categories hold significant importance. To address this issue, we investigate the alignment between the two modalities in the CLIP model and propose the Class-wise Matching Margin (\cmm) to measure the inference confusion. \cmm\ can effectively identify the worst-performing categories and estimate the potential performance of the candidate prompts. We further query large language models to enrich descriptions of worst-performing categories and build a weighted ensemble to highlight the efficient prompts. Experimental results clearly verify the effectiveness of our proposal, where the accuracy on the worst-10 categories on ImageNet is boosted to 5.2\%, without manual prompt engineering, laborious optimization, or access to labeled validation data.
    Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations. (arXiv:2310.03393v1 [math.NA])
    Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.
    Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective. (arXiv:2310.03556v1 [stat.ML])
    Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
    A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling. (arXiv:2310.03298v1 [stat.ML])
    Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.
    Text as Environment: A Deep Reinforcement Learning Text Readability Assessment Model. (arXiv:1912.05957v3 [cs.CL] UPDATED)
    Evaluating the readability of a text can significantly facilitate the precise expression of information in written form. The formulation of text readability assessment involves the identification of meaningful properties of the text regardless of its length. Sophisticated features and models are used to evaluate the comprehensibility of texts accurately. Despite this, the problem of assessing texts' readability efficiently remains relatively untouched. The efficiency of state-of-the-art text readability assessment models can be further improved using deep reinforcement learning models. Using a hard attention-based active inference technique, the proposed approach makes efficient use of input text and computational resources. Through the use of semi-supervised signals, the reinforcement learning model uses the minimum amount of text in order to determine text's readability. A comparison of the model on Weebit and Cambridge Exams with state-of-the-art models, such as the BERT text readability model, shows that it is capable of achieving state-of-the-art accuracy with a significantly smaller amount of input text than other models.
    Neural Language Model Pruning for Automatic Speech Recognition. (arXiv:2310.03424v1 [cs.LG])
    We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.
    Probabilistic Forecasting of Day-Ahead Electricity Prices and their Volatility with LSTMs. (arXiv:2310.03339v1 [cs.LG])
    Accurate forecasts of electricity prices are crucial for the management of electric power systems and the development of smart applications. European electricity prices have risen substantially and became highly volatile after the Russian invasion of Ukraine, challenging established forecasting methods. Here, we present a Long Short-Term Memory (LSTM) model for the German-Luxembourg day-ahead electricity prices addressing these challenges. The recurrent structure of the LSTM allows the model to adapt to trends, while the joint prediction of both mean and standard deviation enables a probabilistic prediction. Using a physics-inspired approach - superstatistics - to derive an explanation for the statistics of prices, we show that the LSTM model faithfully reproduces both prices and their volatility.
    Deep Controlled Learning for Inventory Control. (arXiv:2011.15122v6 [cs.LG] UPDATED)
    Problem Definition: Are traditional deep reinforcement learning (DRL) algorithms, developed for a broad range of purposes including game-play and robotics, the most suitable machine learning algorithms for applications in inventory control? To what extent would DRL algorithms tailored to the unique characteristics of inventory control problems provide superior performance compared to DRL and traditional benchmarks? Methodology/results: We propose and study Deep Controlled Learning (DCL), a new DRL framework based on approximate policy iteration specifically designed to tackle inventory problems. Comparative evaluations reveal that DCL outperforms existing state-of-the-art heuristics in lost sales inventory control, perishable inventory systems, and inventory systems with random lead times, achieving lower average costs across all test instances and maintaining an optimality gap of no more than 0.1\%. Notably, the same hyperparameter set is utilized across all experiments, underscoring the robustness and generalizability of the proposed method. Managerial implications: These substantial performance and robustness improvements pave the way for the effective application of tailored DRL algorithms to inventory management problems, empowering decision-makers to optimize stock levels, minimize costs, and enhance responsiveness across various industries.
    FedNAR: Federated Optimization with Normalized Annealing Regularization. (arXiv:2310.03163v1 [cs.LG])
    Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
    Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-to-End. (arXiv:2310.03581v1 [cs.RO])
    Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh.
    TimeGPT-1. (arXiv:2310.03589v1 [cs.LG])
    In this paper, we introduce TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training. We evaluate our pre-trained model against established statistical, machine learning, and deep learning methods, demonstrating that TimeGPT zero-shot inference excels in performance, efficiency, and simplicity. Our study provides compelling evidence that insights from other domains of artificial intelligence can be effectively applied to time series analysis. We conclude that large-scale time series models offer an exciting opportunity to democratize access to precise predictions and reduce uncertainty by leveraging the capabilities of contemporary advancements in deep learning.
    Conditional Generative Models for Simulation of EMG During Naturalistic Movements. (arXiv:2211.01856v4 [cs.LG] UPDATED)
    Numerical models of electromyographic (EMG) signals have provided a huge contribution to our fundamental understanding of human neurophysiology and remain a central pillar of motor neuroscience and the development of human-machine interfaces. However, whilst modern biophysical simulations based on finite element methods are highly accurate, they are extremely computationally expensive and thus are generally limited to modelling static systems such as isometrically contracting limbs. As a solution to this problem, we propose a transfer learning approach, in which a conditional generative model is trained to mimic the output of an advanced numerical model. To this end, we present BioMime, a conditional generative neural network trained adversarially to generate motor unit activation potential waveforms under a wide variety of volume conductor parameters. We demonstrate the ability of such a model to predictively interpolate between a much smaller number of numerical model's outputs with a high accuracy. Consequently, the computational load is dramatically reduced, which allows the rapid simulation of EMG signals during truly dynamic and naturalistic movements.
    A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions. (arXiv:2310.03281v1 [cs.LG])
    The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.
    High-dimensional Bayesian Optimization with Group Testing. (arXiv:2310.03515v1 [cs.LG])
    Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
    Combining Differential Privacy and Byzantine Resilience in Distributed SGD. (arXiv:2110.03991v4 [cs.LG] UPDATED)
    Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(\alpha,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(\alpha,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.
    Benchmarking Large Language Models As AI Research Agents. (arXiv:2310.03302v1 [cs.LG])
    Scientific experimentation involves an iterative process of creating hypotheses, designing experiments, running experiments, and analyzing the results. Can we build AI research agents to perform these long-horizon tasks? To take a step towards building and evaluating research agents on such open-ended decision-making tasks, we focus on the problem of machine learning engineering: given a task description and a dataset, build a high-performing model. In this paper, we propose MLAgentBench, a suite of ML tasks for benchmarking AI research agents. Agents can perform actions like reading/writing files, executing code, and inspecting outputs. With these actions, agents could run experiments, analyze the results, and modify the code of entire machine learning pipelines, such as data processing, architecture, training processes, etc. The benchmark then automatically evaluates the agent's performance objectively over various metrics related to performance and efficiency. We also design an LLM-based research agent to automatically perform experimentation loops in such an environment. Empirically, we find that a GPT-4-based research agent can feasibly build compelling ML models over many tasks in MLAgentBench, displaying highly interpretable plans and actions. However, the success rates vary considerably; they span from almost 90\% on well-established older datasets to as low as 10\% on recent Kaggle Challenges -- unavailable during the LLM model's pretraining -- and even 0\% on newer research challenges like BabyLM. Finally, we identify several key challenges for LLM-based research agents such as long-term planning and hallucination. Our code is released at https://github.com/snap-stanford/MLAgentBench.
    Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance. (arXiv:2304.06715v3 [cs.LG] UPDATED)
    Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalize this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning. Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group. By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines to allow users and developers of interpretability methods to produce robust explanations.
    OpenPatch: a 3D patchwork for Out-Of-Distribution detectionpdf icon. (arXiv:2310.03388v1 [cs.CV])
    Moving deep learning models from the laboratory setting to the open world entails preparing them to handle unforeseen conditions. In several applications the occurrence of novel classes during deployment poses a significant threat, thus it is crucial to effectively detect them. Ideally, this skill should be used when needed without requiring any further computational training effort at every new task. Out-of-distribution detection has attracted significant attention in the last years, however the majority of the studies deal with 2D images ignoring the inherent 3D nature of the real-world and often confusing between domain and semantic novelty. In this work, we focus on the latter, considering the objects geometric structure captured by 3D point clouds regardless of the specific domain. We advance the field by introducing OpenPatch that builds on a large pre-trained model and simply extracts from its intermediate features a set of patch representations that describe each known class. For any new sample, we obtain a novelty score by evaluating whether it can be recomposed mainly by patches of a single known class or rather via the contribution of multiple classes. We present an extensive experimental evaluation of our approach for the task of semantic novelty detection on real-world point cloud samples when the reference known data are synthetic. We demonstrate that OpenPatch excels in both the full and few-shot known sample scenarios, showcasing its robustness across varying pre-training objectives and network backbones. The inherent training-free nature of our method allows for its immediate application to a wide array of real-world tasks, offering a compelling advantage over approaches that need expensive retraining efforts.
    A Framework for Large Scale Synthetic Graph Dataset Generation. (arXiv:2210.01944v4 [cs.LG] UPDATED)
    Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by proposing a scalable synthetic graph generation tool to scale the datasets to production-size graphs with trillions of edges and billions of nodes. The tool learns a series of parametric models from proprietary datasets that can be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. We demonstrate the generalizability of the framework across a series of datasets, mimicking structural and feature distributions as well as the ability to scale them across varying sizes demonstrating their usefulness for benchmarking and model development. Code can be found on https://github.com/NVIDIA/DeepLearningExamples/tree/master/Tools/DGLPyTorch/SyntheticGraphGeneration.
    Two-stage LLM Fine-tuning with Less Specialization and More Generalization. (arXiv:2211.00635v2 [cs.CL] UPDATED)
    Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tuned model needs to handle additional tasks where no fine-tuning data is available. In this work, we first demonstrate that fine-tuning on a single task indeed decreases LLMs' general in-context learning performance. We discover one important cause of such forgetting, format specialization, where the model overfits to the format of the fine-tuned task. We further show that format specialization happens at the very beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that reduces format specialization and improves generalization. ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt attached. With experiments on several fine-tuning tasks and 8 in-context evaluation tasks, we show that ProMoT achieves comparable performance on fine-tuned tasks to standard fine-tuning, but with much less loss of in-context learning performances across a board range of out-of-domain evaluation tasks. More importantly, ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task, e.g. ProMoT on En-Fr translation significantly improves performance on other language pairs, and ProMoT on NLI improves performance on summarization. Experiments also show that ProMoT can improve the generalization performance of multi-task training.
    Enhancing Adversarial Robustness via Score-Based Optimization. (arXiv:2307.04333v2 [cs.LG] UPDATED)
    Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.
    Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition. (arXiv:2310.03033v1 [cs.CV])
    Traffic signs play a critical role in road safety and traffic management for autonomous driving systems. Accurate traffic sign classification is essential but challenging due to real-world complexities like adversarial examples and occlusions. To address these issues, binary neural networks offer promise in constructing classifiers suitable for resource-constrained devices. In our previous work, we proposed high-accuracy BNN models for traffic sign recognition, focusing on compact size for limited computation and energy resources. To evaluate their local robustness, this paper introduces a set of benchmark problems featuring layers that challenge state-of-the-art verification tools. These layers include binarized convolutions, max pooling, batch normalization, fully connected. The difficulty of the verification problem is given by the high number of network parameters (905k - 1.7 M), of the input dimension (2.7k-12k), and of the number of regions (43) as well by the fact that the neural networks are not sparse. The proposed BNN models and local robustness properties can be checked at https://github.com/ChristopherBrix/vnncomp2023_benchmarks/tree/main/benchmarks/traffic_signs_recognition. The results of the 4th International Verification of Neural Networks Competition (VNN-COMP'23) revealed the fact that 4, out of 7, solvers can handle many of our benchmarks randomly selected (minimum is 6, maximum is 36, out of 45). Surprisingly, tools output also wrong results or missing counterexample (ranging from 1 to 4). Currently, our focus lies in exploring the possibility of achieving a greater count of solved instances by extending the allotted time (previously set at 8 minutes). Furthermore, we are intrigued by the reasons behind the erroneous outcomes provided by the tools for certain benchmarks.
    Comparing Time-Series Analysis Approaches Utilized in Research Papers to Forecast COVID-19 Cases in Africa: A Literature Review. (arXiv:2310.03606v1 [cs.LG])
    This literature review aimed to compare various time-series analysis approaches utilized in forecasting COVID-19 cases in Africa. The study involved a methodical search for English-language research papers published between January 2020 and July 2023, focusing specifically on papers that utilized time-series analysis approaches on COVID-19 datasets in Africa. A variety of databases including PubMed, Google Scholar, Scopus, and Web of Science were utilized for this process. The research papers underwent an evaluation process to extract relevant information regarding the implementation and performance of the time-series analysis models. The study highlighted the different methodologies employed, evaluating their effectiveness and limitations in forecasting the spread of the virus. The result of this review could contribute deeper insights into the field, and future research should consider these insights to improve time series analysis models and explore the integration of different approaches for enhanced public health decision-making.
    Sparse Deep Learning for Time Series Data: Theory and Applications. (arXiv:2310.03243v1 [stat.ML])
    Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.
    Robust Representation Learning via Asymmetric Negative Contrast and Reverse Attention. (arXiv:2310.03358v1 [cs.CV])
    Deep neural networks are vulnerable to adversarial noise. Adversarial training (AT) has been demonstrated to be the most effective defense strategy to protect neural networks from being fooled. However, we find AT omits to learning robust features, resulting in poor performance of adversarial robustness. To address this issue, we highlight two characteristics of robust representation: (1) $\bf{exclusion}$: the feature of natural examples keeps away from that of other classes; (2) $\bf{alignment}$: the feature of natural and corresponding adversarial examples is close to each other. These motivate us to propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Specifically, we design an asymmetric negative contrast based on predicted probabilities, to push away examples of different classes in the feature space. Moreover, we propose to weight feature by parameters of the linear classifier as the reverse attention, to obtain class-aware feature and pull close the feature of the same class. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance. Code is available at .
    Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. (arXiv:2110.14883v3 [cs.LG] UPDATED)
    The success of Transformer models has pushed the deep learning model scale to billions of parameters. Due to the limited memory resource of a single GPU, However, the best practice for choosing the optimal parallel strategy is still lacking, since it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism, as well as heterogeneous training methods integrated with zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.
    Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein. (arXiv:2310.03398v1 [cs.LG])
    We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization. (arXiv:2310.03456v1 [cs.CV])
    Temporal Action Localization (TAL) aims to identify actions' start, end, and class labels in untrimmed videos. While recent advancements using transformer networks and Feature Pyramid Networks (FPN) have enhanced visual feature recognition in TAL tasks, less progress has been made in the integration of audio features into such frameworks. This paper introduces the Multi-Resolution Audio-Visual Feature Fusion (MRAV-FF), an innovative method to merge audio-visual data across different temporal resolutions. Central to our approach is a hierarchical gated cross-attention mechanism, which discerningly weighs the importance of audio information at diverse temporal scales. Such a technique not only refines the precision of regression boundaries but also bolsters classification confidence. Importantly, MRAV-FF is versatile, making it compatible with existing FPN TAL architectures and offering a significant enhancement in performance when audio data is available.
    FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent. (arXiv:2310.03156v1 [cs.LG])
    The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
    On the definition of toxicity in NLP. (arXiv:2310.02357v2 [cs.CL] UPDATED)
    The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out. This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
    Towards Understanding the Effect of Pretraining Label Granularity. (arXiv:2303.16887v2 [cs.CV] UPDATED)
    In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels, which supports the common practice used in the community. Theoretically, we explain the benefit of fine-grained pretraining by proving that, for a data distribution satisfying certain hierarchy conditions, 1) coarse-grained pretraining only allows a neural network to learn the "common" or "easy-to-learn" features well, while 2) fine-grained pretraining helps the network learn the "rarer" or "fine-grained" features in addition to the common ones, thus improving its accuracy on hard downstream test samples in which common features are missing or weak in strength. Furthermore, we perform comprehensive experiments using the label hierarchies of iNaturalist 2021 and observe that the following conditions, in addition to proper choice of label granularity, enable the transfer to work well in practice: 1) the pretraining dataset needs to have a meaningful label hierarchy, and 2) the pretraining and target label functions need to align well.
    Learning Energy Decompositions for Partial Inference of GFlowNets. (arXiv:2310.03301v1 [cs.LG])
    This paper studies generative flow networks (GFlowNets) to sample objects from the Boltzmann energy distribution via a sequence of actions. In particular, we focus on improving GFlowNet with partial inference: training flow functions with the evaluation of the intermediate states or transitions. To this end, the recently developed forward-looking GFlowNet reparameterizes the flow functions based on evaluating the energy of intermediate states. However, such an evaluation of intermediate energies may (i) be too expensive or impossible to evaluate and (ii) even provide misleading training signals under large energy fluctuations along the sequence of actions. To resolve this issue, we propose learning energy decompositions for GFlowNets (LED-GFN). Our main idea is to (i) decompose the energy of an object into learnable potential functions defined on state transitions and (ii) reparameterize the flow functions using the potential functions. In particular, to produce informative local credits, we propose to regularize the potential to change smoothly over the sequence of actions. It is also noteworthy that training GFlowNet with our learned potential can preserve the optimal policy. We empirically verify the superiority of LED-GFN in five problems including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences.
    Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. (arXiv:2310.03529v1 [cs.LG])
    We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.
    Relational Convolutional Networks: A framework for learning representations of hierarchical relations. (arXiv:2310.03240v1 [cs.LG])
    A maturing area of research in deep learning is the development of architectures that can learn explicit representations of relational features. In this paper, we focus on the problem of learning representations of hierarchical relations, proposing an architectural framework we call "relational convolutional networks". Given a sequence of objects, a "multi-dimensional inner product relation" module produces a relation tensor describing all pairwise relations. A "relational convolution" layer then transforms the relation tensor into a sequence of new objects, each describing the relations within some group of objects at the previous layer. Graphlet filters, analogous to filters in convolutional neural networks, represent a template of relations against which the relation tensor is compared at each grouping. Repeating this yields representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.
    The Geometric Structure of Fully-Connected ReLU-Layers. (arXiv:2310.03482v1 [cs.LG])
    We formalize and interpret the geometric structure of $d$-dimensional fully connected ReLU-layers in neural networks. The parameters of a ReLU-layer induce a natural partition of the input domain, such that in each sector of the partition, the ReLU-layer can be greatly simplified. This leads to a geometric interpretation of a ReLU-layer as a projection onto a polyhedral cone followed by an affine transformation, in line with the description in [doi:10.48550/arXiv.1905.08922] for convolutional networks with ReLU activations. Further, this structure facilitates simplified expressions for preimages of the intersection between partition sectors and hyperplanes, which is useful when describing decision boundaries in a classification setting. We investigate this in detail for a feed-forward network with one hidden ReLU-layer, where we provide results on the geometric complexity of the decision boundary generated by such networks, as well as proving that modulo an affine transformation, such a network can only generate $d$ different decision boundaries. Finally, the effect of adding more layers to the network is discussed.
    Towards out-of-distribution generalizable predictions of chemical kinetics properties. (arXiv:2310.03152v1 [cs.LG])
    Machine Learning (ML) techniques have found applications in estimating chemical kinetics properties. With the accumulated drug molecules identified through "AI4drug discovery", the next imperative lies in AI-driven design for high-throughput chemical synthesis processes, with the estimation of properties of unseen reactions with unexplored molecules. To this end, the existing ML approaches for kinetics property prediction are required to be Out-Of-Distribution (OOD) generalizable. In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism), revealing unique aspects of such problems. Under this framework, we create comprehensive datasets to benchmark (1) the state-of-the-art ML approaches for reaction prediction in the OOD setting and (2) the state-of-the-art graph OOD methods in kinetics property prediction problems. Our results demonstrated the challenges and opportunities in OOD kinetics property prediction. Our datasets and benchmarks can further support research in this direction.
    Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation. (arXiv:2310.03112v1 [stat.ML])
    Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
    Mitigating Pilot Contamination and Enabling IoT Scalability in Massive MIMO Systems. (arXiv:2310.03278v1 [cs.IT])
    Massive MIMO is expected to play an important role in the development of 5G networks. This paper addresses the issue of pilot contamination and scalability in massive MIMO systems. The current practice of reusing orthogonal pilot sequences in adjacent cells leads to difficulty in differentiating incoming inter- and intra-cell pilot sequences. One possible solution is to increase the number of orthogonal pilot sequences, which results in dedicating more space of coherence block to pilot transmission than data transmission. This, in turn, also hinders the scalability of massive MIMO systems, particularly in accommodating a large number of IoT devices within a cell. To overcome these challenges, this paper devises an innovative pilot allocation scheme based on the data transfer patterns of IoT devices. The scheme assigns orthogonal pilot sequences to clusters of devices instead of individual devices, allowing multiple devices to utilize the same pilot for periodically transmitting data. Moreover, we formulate the pilot assignment problem as a graph coloring problem and use the max k-cut graph partitioning approach to overcome the pilot contamination in a multicell massive MIMO system. The proposed scheme significantly improves the spectral efficiency and enables the scalability of massive MIMO systems; for instance, by using ten orthogonal pilot sequences, we are able to accommodate 200 devices with only a 12.5% omission rate.
    Molecule Design by Latent Prompt Transformer. (arXiv:2310.03253v1 [cs.LG])
    This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.
    Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets. (arXiv:2310.03258v1 [cs.LG])
    Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in treatment effects, and limited data availability. To address these challenges, we introduce a novel approach for counterfactual causal analysis centered on energy justice. We use subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup. In our numerical analysis, we apply our method to a large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions. This points to existing biases in the power system and highlights the need for focused improvements in areas with economic challenges.
    Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors. (arXiv:2310.03166v1 [cs.CR])
    Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.
    History Matching for Geological Carbon Storage using Data-Space Inversion with Spatio-Temporal Data Parameterization. (arXiv:2310.03228v1 [cs.LG])
    History matching based on monitoring data will enable uncertainty reduction, and thus improved aquifer management, in industrial-scale carbon storage operations. In traditional model-based data assimilation, geomodel parameters are modified to force agreement between flow simulation results and observations. In data-space inversion (DSI), history-matched quantities of interest, e.g., posterior pressure and saturation fields conditioned to observations, are inferred directly, without constructing posterior geomodels. This is accomplished efficiently using a set of O(1000) prior simulation results, data parameterization, and posterior sampling within a Bayesian setting. In this study, we develop and implement (in DSI) a deep-learning-based parameterization to represent spatio-temporal pressure and CO2 saturation fields at a set of time steps. The new parameterization uses an adversarial autoencoder (AAE) for dimension reduction and a convolutional long short-term memory (convLSTM) network to represent the spatial distribution and temporal evolution of the pressure and saturation fields. This parameterization is used with an ensemble smoother with multiple data assimilation (ESMDA) in the DSI framework to enable posterior predictions. A realistic 3D system characterized by prior geological realizations drawn from a range of geological scenarios is considered. A local grid refinement procedure is introduced to estimate the error covariance term that appears in the history matching formulation. Extensive history matching results are presented for various quantities, for multiple synthetic true models. Substantial uncertainty reduction in posterior pressure and saturation fields is achieved in all cases. The framework is applied to efficiently provide posterior predictions for a range of error covariance specifications. Such an assessment would be expensive using a model-based approach.
    Maximum Likelihood Estimation of Latent Variable Structural Equation Models: A Neural Network Approach. (arXiv:2309.14073v2 [stat.ML] UPDATED)
    We propose a graphical structure for structural equation models that is stable under marginalization under linearity and Gaussianity assumptions. We show that computing the maximum likelihood estimation of this model is equivalent to training a neural network. We implement a GPU-based algorithm that computes the maximum likelihood estimation of these models.
    Knowledge Distillation Under Ideal Joint Classifier Assumption. (arXiv:2304.11004v2 [cs.LG] UPDATED)
    Knowledge distillation constitutes a potent methodology for condensing substantial neural networks into more compact and efficient counterparts. Within this context, softmax regression representation learning serves as a widely embraced approach, leveraging a pre-established teacher network to guide the learning process of a diminutive student network. Notably, despite the extensive inquiry into the efficacy of softmax regression representation learning, the intricate underpinnings governing the knowledge transfer mechanism remain inadequately elucidated. This study introduces the 'Ideal Joint Classifier Knowledge Distillation' (IJCKD) framework, an overarching paradigm that not only furnishes a lucid and exhaustive comprehension of prevailing knowledge distillation techniques but also establishes a theoretical underpinning for prospective investigations. Employing mathematical methodologies derived from domain adaptation theory, this investigation conducts a comprehensive examination of the error boundary of the student network contingent upon the teacher network. Consequently, our framework facilitates efficient knowledge transference between teacher and student networks, thereby accommodating a diverse spectrum of applications.
    PoseAction: Action Recognition for Patients in the Ward using Deep Learning Approaches. (arXiv:2310.03288v1 [cs.CV])
    Real-time intelligent detection and prediction of subjects' behavior particularly their movements or actions is critical in the ward. This approach offers the advantage of reducing in-hospital care costs and improving the efficiency of healthcare workers, which is especially true for scenarios at night or during peak admission periods. Therefore, in this work, we propose using computer vision (CV) and deep learning (DL) methods for detecting subjects and recognizing their actions. We utilize OpenPose as an accurate subject detector for recognizing the positions of human subjects in the video stream. Additionally, we employ AlphAction's Asynchronous Interaction Aggregation (AIA) network to predict the actions of detected subjects. This integrated model, referred to as PoseAction, is proposed. At the same time, the proposed model is further trained to predict 12 common actions in ward areas, such as staggering, chest pain, and falling down, using medical-related video clips from the NTU RGB+D and NTU RGB+D 120 datasets. The results demonstrate that PoseAction achieves the highest classification mAP of 98.72% (IoU@0.5). Additionally, this study develops an online real-time mode for action recognition, which strongly supports the clinical translation of PoseAction. Furthermore, using OpenPose's function for recognizing face key points, we also implement face blurring, which is a practical solution to address the privacy protection concerns of patients and healthcare workers. Nevertheless, the training data for PoseAction is currently limited, particularly in terms of label diversity. Consequently, the subsequent step involves utilizing a more diverse dataset (including general actions) to train the model's parameters for improved generalization.
    Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing. (arXiv:2310.03052v1 [cs.LG])
    Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian theory which is a major theory explaining human memory formulation to enhance long-term dependencies in neural networks. Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb's rule. Through experiments with popular Transformer-based models like BERT and GPT, we present that Memoria significantly improves the ability to consider long-term dependencies in various tasks. Results show that Memoria outperformed existing methodologies in sorting and language modeling, and long text classification.
    DP-SGD for non-decomposable objective functions. (arXiv:2310.03104v1 [cs.LG])
    Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their $L_2$ sensitivity can grow with increasing batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular the commonly used contrastive loss -- that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$. We test our DP-SGD variant on some preliminary CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
    How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT Responses. (arXiv:2310.03031v1 [cs.CL])
    With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system's output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system's responses for biases as well as for syntactic and grammatical mistakes.
    Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data. (arXiv:2310.03111v1 [cs.LG])
    Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.
    PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks. (arXiv:2310.03212v1 [cs.LG])
    Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks. However, they are limited in their ability to handle rotational and viewpoint variations due to information loss in max-pooling layers. Capsule Networks (CapsNets) employ a computationally-expensive iterative process referred to as dynamic routing to address these issues. CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs. To overcome these challenges, we introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet that offers superior performance, less energy consumption, and lower overfitting rates. By leveraging a parallelization strategy, PDR-CapsNet mitigates the computational complexity of CapsNet and increases throughput, efficiently using hardware resources. As a result, we achieve 83.55\% accuracy while requiring 87.26\% fewer parameters, 32.27\% and 47.40\% fewer MACs, and Flops, achieving 3x faster inference and 7.29J less energy consumption on a 2080Ti GPU with 11GB VRAM compared to CapsNet and for the CIFAR-10 dataset.
    Regret Analysis of Distributed Online Control for LTI Systems with Adversarial Disturbances. (arXiv:2310.03206v1 [math.OC])
    This paper addresses the distributed online control problem over a network of linear time-invariant (LTI) systems (with possibly unknown dynamics) in the presence of adversarial perturbations. There exists a global network cost that is characterized by a time-varying convex function, which evolves in an adversarial manner and is sequentially and partially observed by local agents. The goal of each agent is to generate a control sequence that can compete with the best centralized control policy in hindsight, which has access to the global cost. This problem is formulated as a regret minimization. For the case of known dynamics, we propose a fully distributed disturbance feedback controller that guarantees a regret bound of $O(\sqrt{T}\log T)$, where $T$ is the time horizon. For the unknown dynamics case, we design a distributed explore-then-commit approach, where in the exploration phase all agents jointly learn the system dynamics, and in the learning phase our proposed control algorithm is applied using each agent system estimate. We establish a regret bound of $O(T^{2/3} \text{poly}(\log T))$ for this setting.
    Sharpness-Aware Minimization and the Edge of Stability. (arXiv:2309.12488v3 [cs.LG] UPDATED)
    Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
    Synergistic Fusion of Graph and Transformer Features for Enhanced Molecular Property Prediction. (arXiv:2310.03027v1 [physics.chem-ph])
    Molecular property prediction is a critical task in computational drug discovery. While recent advances in Graph Neural Networks (GNNs) and Transformers have shown to be effective and promising, they face the following limitations: Transformer self-attention does not explicitly consider the underlying molecule structure while GNN feature representation alone is not sufficient to capture granular and hidden interactions and characteristics that distinguish similar molecules. To address these limitations, we propose SYN- FUSION, a novel approach that synergistically combines pre-trained features from GNNs and Transformers. This approach provides a comprehensive molecular representation, capturing both the global molecule structure and the individual atom characteristics. Experimental results on MoleculeNet benchmarks demonstrate superior performance, surpassing previous models in 5 out of 7 classification datasets and 4 out of 6 regression datasets. The performance of SYN-FUSION has been compared with other Graph-Transformer models that have been jointly trained using a combination of transformer and graph features, and it is found that our approach is on par with those models in terms of performance. Extensive analysis of the learned fusion model across aspects such as loss, latent space, and weight distribution further validates the effectiveness of SYN-FUSION. Finally, an ablation study unequivocally demonstrates that the synergy achieved by SYN-FUSION surpasses the performance of its individual model components and their ensemble, offering a substantial improvement in predicting molecular properties.
    Discovering Knowledge-Critical Subnetworks in Pretrained Language Models. (arXiv:2310.03084v1 [cs.CL])
    Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. However, localizing these representations and disentangling them from each other remains an open problem. In this work, we investigate whether pretrained language models contain various knowledge-critical subnetworks: particular sparse computational subgraphs responsible for encoding specific knowledge the model has memorized. We propose a multi-objective differentiable weight masking scheme to discover these subnetworks and show that we can use them to precisely remove specific knowledge from models while minimizing adverse effects on the behavior of the original language model. We demonstrate our method on multiple GPT2 variants, uncovering highly sparse subnetworks (98%+) that are solely responsible for specific collections of relational knowledge. When these subnetworks are removed, the remaining network maintains most of its initial capacity (modeling language and other memorized relational knowledge) but struggles to express the removed knowledge, and suffers performance drops on examples needing this removed knowledge on downstream tasks after finetuning.
    Assessment of Prediction Intervals Using Uncertainty Characteristics Curves. (arXiv:2310.03158v1 [cs.LG])
    Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to an ad-hoc operating point, making evaluation and comparison across different studies relatively difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a null reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper defines the Uncertainty Characteristics Curve and demonstrates its utility in selected scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.
    Physics-Informed Neural Networks for Accelerating Power System State Estimation. (arXiv:2310.03088v1 [cs.LG])
    State estimation is the cornerstone of the power system control center since it provides the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.
    Creating an Atlas of Normal Tissue for Pruning WSI Patching Through Anomaly Detection. (arXiv:2310.03106v1 [eess.IV])
    Patching gigapixel whole slide images (WSIs) is an important task in computational pathology. Some methods have been proposed to select a subset of patches as WSI representation for downstream tasks. While most of the computational pathology tasks are designed to classify or detect the presence of pathological lesions in each WSI, the confounding role and redundant nature of normal histology in tissue samples are generally overlooked in WSI representations. In this paper, we propose and validate the concept of an "atlas of normal tissue" solely using samples of WSIs obtained from normal tissue biopsies. Such atlases can be employed to eliminate normal fragments of tissue samples and hence increase the representativeness collection of patches. We tested our proposed method by establishing a normal atlas using 107 normal skin WSIs and demonstrated how established indexes and search engines like Yottixel can be improved. We used 553 WSIs of cutaneous squamous cell carcinoma (cSCC) to show the advantage. We also validated our method applied to an external dataset of 451 breast WSIs. The number of selected WSI patches was reduced by 30% to 50% after utilizing the proposed normal atlas while maintaining the same indexing and search performance in leave-one-patinet-out validation for both datasets. We show that the proposed normal atlas shows promise for unsupervised selection of the most representative patches of the abnormal/malignant WSI lesions.
    Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning. (arXiv:2310.03094v1 [cs.CL])
    Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.
    QuATON: Quantization Aware Training of Optical Neurons. (arXiv:2310.03049v1 [cs.LG])
    Optical neural architectures (ONAs) use coding elements with optimized physical parameters to perform intelligent measurements. However, fabricating ONAs while maintaining design performances is challenging. Limitations in fabrication techniques often limit the realizable precision of the trained parameters. Physical constraints may also limit the range of values the physical parameters can hold. Thus, ONAs should be trained within the implementable constraints. However, such physics-based constraints reduce the training objective to a constrained optimization problem, making it harder to optimize with existing gradient-based methods. To alleviate these critical issues that degrade performance from simulation to realization we propose a physics-informed quantization-aware training framework. Our approach accounts for the physical constraints during the training process, leading to robust designs. We evaluate our approach on an ONA proposed in the literature, named a diffractive deep neural network (D2NN), for all-optical phase imaging and for classification of phase objects. With extensive experiments on different quantization levels and datasets, we show that our approach leads to ONA designs that are robust to quantization noise.
    Enhancing Accuracy in Deep Learning Using Random Matrix Theory. (arXiv:2310.03165v1 [cs.LG])
    In this study, we explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning to simplify DNN architecture and loss landscape. RMT, recently used to address overfitting in deep learning, enables the examination of DNN's weight layer spectra. We use these techniques to optimally determine the number of singular values to be removed from the weight layers of a DNN during training via singular value decomposition (SVD). This process aids in DNN simplification and accuracy enhancement, as evidenced by training simple DNN models on the MNIST and Fashion MNIST datasets. Our method can be applied to any fully connected or convolutional layer of a pretrained DNN, decreasing the layer's parameters and simplifying the DNN architecture while preserving or even enhancing the model's accuracy. By discarding small singular values based on RMT criteria, the accuracy of the test set remains consistent, facilitating more efficient DNN training without compromising performance. We provide both theoretical and empirical evidence supporting our claim that the elimination of small singular values based on RMT does not negatively impact the DNN's accuracy. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
    The Cadenza ICASSP 2024 Grand Challenge. (arXiv:2310.03480v1 [eess.AS])
    The Cadenza project aims to enhance the audio quality of music for individuals with hearing loss. As part of this, the project is organizing the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The challenge can be tackled by decomposing the music at the hearing aid microphones into vocals, bass, drums, and other components. These can then be intelligently remixed in a personalized manner to improve audio quality. Alternatively, an end-to-end approach could be used. Processes need to consider the music itself, the gain applied to each component, and the listener's hearing loss. The submitted entries will be evaluated using the intrusive objective metric, the Hearing Aid Audio Quality Index (HAAQI). This paper outlines the challenge.
    Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms. (arXiv:2310.03225v1 [cs.LG])
    Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then propose a solution of the GSE problem in the form of a meta-algorithm for safe exploration, MASE, which combines an unconstrained RL algorithm with an uncertainty quantifier to guarantee safety in the current episode while properly penalizing unsafe explorations before actual safety violation to discourage them in future episodes. The advantage of MASE is that we can optimize a policy while guaranteeing with a high probability that no safety constraint will be violated under proper assumptions. Specifically, we present two variants of MASE with different constructions of the uncertainty quantifier: one based on generalized linear models with theoretical guarantees of safety and near-optimality, and another that combines a Gaussian process to ensure safety with a deep RL algorithm to maximize the reward. Finally, we demonstrate that our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks without violating any safety constraints, even during training.
    TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design. (arXiv:2310.03223v1 [cs.LG])
    We seek to automate the generation of drug-like compounds conditioned to specific protein pocket targets. Most current methods approximate the protein-molecule distribution of a finite dataset and, therefore struggle to generate molecules with significant binding improvement over the training dataset. We instead frame the pocket-conditioned molecular generation task as an RL problem and develop TacoGFN, a target conditional Generative Flow Network model. Our method is explicitly encouraged to generate molecules with desired properties as opposed to fitting on a pre-existing data distribution. To this end, we develop transformer-based docking score prediction to speed up docking score computation and propose TacoGFN to explore molecule space efficiently. Furthermore, we incorporate several rounds of active learning where generated samples are queried using a docking oracle to improve the docking score prediction. This approach allows us to accurately explore as much of the molecule landscape as we can afford computationally. Empirically, molecules generated using TacoGFN and its variants significantly outperform all baseline methods across every property (Docking score, QED, SA, Lipinski), while being orders of magnitude faster.
    Formal and Practical Elements for the Certification of Machine Learning Systems. (arXiv:2310.03217v1 [cs.LG])
    Over the past decade, machine learning has demonstrated impressive results, often surpassing human capabilities in sensing tasks relevant to autonomous flight. Unlike traditional aerospace software, the parameters of machine learning models are not hand-coded nor derived from physics but learned from data. They are automatically adjusted during a training phase, and their values do not usually correspond to physical requirements. As a result, requirements cannot be directly traced to lines of code, hindering the current bottom-up aerospace certification paradigm. This paper attempts to address this gap by 1) demystifying the inner workings and processes to build machine learning models, 2) formally establishing theoretical guarantees given by those processes, and 3) complementing these formal elements with practical considerations to develop a complete certification argument for safety-critical machine learning systems. Based on a scalable statistical verifier, our proposed framework is model-agnostic and tool-independent, making it adaptable to many use cases in the industry. We demonstrate results on a widespread application in autonomous flight: vision-based landing.
    Fragment-based Pretraining and Finetuning on Molecular Graphs. (arXiv:2310.03274v1 [cs.LG])
    Property prediction on molecular graphs is an important application of Graph Neural Networks (GNNs). Recently, unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. In this work, we propose pretraining GNNs at the fragment level, which serves as a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Borrowing techniques from recent work on principle subgraph mining, we obtain a compact vocabulary of prevalent fragments that span a large pretraining dataset. From the extracted vocabulary, we introduce several fragment-based contrastive and predictive pretraining tasks. The contrastive learning task jointly pretrains two different GNNs: one based on molecular graphs and one based on fragment graphs, which represents high-order connectivity within molecules. By enforcing the consistency between the fragment embedding and the aggregated embedding of the corresponding atoms from the molecular graphs, we ensure that both embeddings capture structural information at multiple resolutions. The structural information of the fragment graphs is further exploited to extract auxiliary labels for the graph-level predictive pretraining. We employ both the pretrained molecular-based and fragment-based GNNs for downstream prediction, thus utilizing the fragment information during finetuning. Our models advance the performances on 5 out of 8 common molecular benchmarks and improve the performances on long-range biological benchmarks by at least 11.5%.
    Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. (arXiv:2310.03530v1 [cs.LG])
    The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
    GPT-MolBERTa: GPT Molecular Features Language Model for molecular property prediction. (arXiv:2310.03030v1 [physics.chem-ph])
    With the emergence of Transformer architectures and their powerful understanding of textual data, a new horizon has opened up to predict the molecular properties based on text description. While SMILES are the most common form of representation, they are lacking robustness, rich information and canonicity, which limit their effectiveness in becoming generalizable representations. Here, we present GPT-MolBERTa, a self-supervised large language model (LLM) which uses detailed textual descriptions of molecules to predict their properties. A text based description of 326000 molecules were collected using ChatGPT and used to train LLM to learn the representation of molecules. To predict the properties for the downstream tasks, both BERT and RoBERTa models were used in the finetuning stage. Experiments show that GPT-MolBERTa performs well on various molecule property benchmarks, and approaching state of the art performance in regression tasks. Additionally, further analysis of the attention mechanisms show that GPT-MolBERTa is able to pick up important information from the input textual data, displaying the interpretability of the model.
    A Deep Reinforcement Learning Approach for Interactive Search with Sentence-level Feedback. (arXiv:2310.03043v1 [cs.LG])
    Interactive search can provide a better experience by incorporating interaction feedback from the users. This can significantly improve search accuracy as it helps avoid irrelevant information and captures the users' search intents. Existing state-of-the-art (SOTA) systems use reinforcement learning (RL) models to incorporate the interactions but focus on item-level feedback, ignoring the fine-grained information found in sentence-level feedback. Yet such feedback requires extensive RL action space exploration and large amounts of annotated data. This work addresses these challenges by proposing a new deep Q-learning (DQ) approach, DQrank. DQrank adapts BERT-based models, the SOTA in natural language processing, to select crucial sentences based on users' engagement and rank the items to obtain more satisfactory responses. We also propose two mechanisms to better explore optimal actions. DQrank further utilizes the experience replay mechanism in DQ to store the feedback sentences to obtain a better initial ranking performance. We validate the effectiveness of DQrank on three search datasets. The results show that DQRank performs at least 12% better than the previous SOTA RL approaches. We also conduct detailed ablation studies. The ablation results demonstrate that each model component can efficiently extract and accumulate long-term engagement effects from the users' sentence-level feedback. This structure offers new technologies with promised performance to construct a search system with sentence-level interaction.
    Differentiable Chemical Physics by Geometric Deep Learning for Gradient-based Property Optimization of Mixtures. (arXiv:2310.03047v1 [physics.chem-ph])
    Chemical mixtures, satisfying multi-objective performance metrics and constraints, enable their use in chemical processes and electrochemical devices. In this work, we develop a differentiable chemical-physics framework for modeling chemical mixtures, DiffMix, where geometric deep learning (GDL) is leveraged to map from molecular species, compositions and environment conditions, to physical coefficients in the mixture physics laws. In particular, we extend mixture thermodynamic and transport laws by creating learnable physical coefficients, where we use graph neural networks as the molecule encoder and enforce component-wise permutation-invariance. We start our model evaluations with thermodynamics of binary mixtures, and further benchmarked multicomponent electrolyte mixtures on their transport properties, in order to test the model generalizability. We show improved prediction accuracy and model robustness of DiffMix than its purely data-driven variants. Furthermore, we demonstrate the efficient optimization of electrolyte transport properties, built on the gradient obtained using DiffMix auto-differentiation. Our simulation runs are then backed up by the data generated by a robotic experimentation setup, Clio. By combining mixture physics and GDL, DiffMix expands the predictive modeling methods for chemical mixtures and provides low-cost optimization approaches in large chemical spaces.
    Modified LAB Algorithm with Clustering-based Search Space Reduction Method for solving Engineering Design Problems. (arXiv:2310.03055v1 [cs.LG])
    A modified LAB algorithm is introduced in this paper. It builds upon the original LAB algorithm (Reddy et al. 2023), which is a socio-inspired algorithm that models competitive and learning behaviours within a group, establishing hierarchical roles. The proposed algorithm incorporates the roulette wheel approach and a reduction factor introducing inter-group competition and iteratively narrowing down the sample space. The algorithm is validated by solving the benchmark test problems from CEC 2005 and CEC 2017. The solutions are validated using standard statistical tests such as two-sided and pairwise signed rank Wilcoxon test and Friedman rank test. The algorithm exhibited improved and superior robustness as well as search space exploration capabilities. Furthermore, a Clustering-Based Search Space Reduction (C-SSR) method is proposed, making the algorithm capable to solve constrained problems. The C-SSR method enables the algorithm to identify clusters of feasible regions, satisfying the constraints and contributing to achieve the optimal solution. This method demonstrates its effectiveness as a potential alternative to traditional constraint handling techniques. The results obtained using the Modified LAB algorithm are then compared with those achieved by other recent metaheuristic algorithms.
    Test Case Recommendations with Distributed Representation of Code Syntactic Features. (arXiv:2310.03174v1 [cs.LG])
    Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases.
    Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs. (arXiv:2310.03221v1 [cs.LG])
    Knowledge graphs (KGs) have emerged as a powerful framework for representing and integrating complex biomedical information. However, assembling KGs from diverse sources remains a significant challenge in several aspects, including entity alignment, scalability, and the need for continuous updates to keep pace with scientific advancements. Moreover, the representative power of KGs is often limited by the scarcity of multi-modal data integration. To overcome these challenges, we propose Know2BIO, a general-purpose heterogeneous KG benchmark for the biomedical domain. Know2BIO integrates data from 30 diverse sources, capturing intricate relationships across 11 biomedical categories. It currently consists of ~219,000 nodes and ~6,200,000 edges. Know2BIO is capable of user-directed automated updating to reflect the latest knowledge in biomedical science. Furthermore, Know2BIO is accompanied by multi-modal data: node features including text descriptions, protein and compound sequences and structures, enabling the utilization of emerging natural language processing methods and multi-modal data integration strategies. We evaluate KG representation models on Know2BIO, demonstrating its effectiveness as a benchmark for KG representation learning in the biomedical field. Data and source code of Know2BIO are available at https://github.com/Yijia-Xiao/Know2BIO/.
    Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models. (arXiv:2310.03059v1 [cs.CV])
    The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code will be released at https://github.com/EvenJoker/Point-PEFT.
    Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising. (arXiv:2310.03085v1 [cs.LG])
    We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database.To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch based image denoising.}
    Crossed-IoT device portability of Electromagnetic Side Channel Analysis: Challenges and Dataset. (arXiv:2310.03119v1 [cs.LG])
    IoT (Internet of Things) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT Forensics is collecting and analyzing digital evidence from IoT devices to investigate cybercrimes, security breaches, and other malicious activities that may have taken place on these connected devices. In particular, EM-SCA has become an essential tool for IoT forensics due to its ability to reveal confidential information about the internal workings of IoT devices without interfering these devices or wiretapping their networks. However, the accuracy and reliability of EM-SCA results can be limited by device variability, environmental factors, and data collection and processing methods. Besides, there is very few research on these limitations that affects significantly the accuracy of EM-SCA approaches for the crossed-IoT device portability as well as limited research on the possible solutions to address such challenge. Therefore, this empirical study examines the impact of device variability on the accuracy and reliability of EM-SCA approaches, in particular machine-learning (ML) based approaches for EM-SCA. We firstly presents the background, basic concepts and techniques used to evaluate the limitations of current EM-SCA approaches and datasets. Our study then addresses one of the most important limitation, which is caused by the multi-core architecture of the processors (SoC). We present an approach to collect the EM-SCA datasets and demonstrate the feasibility of using transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices. Our study moreover contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter.
    BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields. (arXiv:2310.03563v1 [cs.CV])
    We aim to improve the Inverted Neural Radiance Fields (iNeRF) algorithm which defines the image pose estimation problem as a NeRF based iterative linear optimization. NeRFs are novel neural space representation models that can synthesize photorealistic novel views of real-world scenes or objects. Our contributions are as follows: we extend the localization optimization objective with a depth-based loss function, we introduce a multi-image based loss function where a sequence of images with known relative poses are used without increasing the computational complexity, we omit hierarchical sampling during volumetric rendering, meaning only the coarse model is used for pose estimation, and we how that by extending the sampling interval convergence can be achieved even or higher initial pose estimate errors. With the proposed modifications the convergence speed is significantly improved, and the basin of convergence is substantially extended.
    Learning Concept-Based Visual Causal Transition and Symbolic Reasoning for Visual Planning. (arXiv:2310.03325v1 [cs.AI])
    Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories and unseen object categories.
    LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework. (arXiv:2310.03342v1 [cs.LG])
    In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.
    Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel. (arXiv:2310.03054v1 [stat.ML])
    We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.
    Fine-tune Language Models to Approximate Unbiased In-context Learning. (arXiv:2310.03331v1 [cs.LG])
    In-context learning (ICL) is an astonishing emergent ability of large language models (LLMs). By presenting a prompt that includes multiple input-output pairs as examples and introducing a new query input, models can generate the corresponding output. However, the performance of models heavily relies on the quality of the input prompt when implementing in-context learning. Biased or imbalanced input prompts can significantly degrade the performance of language models. To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning). This algorithm fine-tunes language models using an unbiased validation set to determine the optimal weight for each input-output example to approximate unbiased in-context learning. Furthermore, we also introduce a low-cost reweighted algorithm, a linear optimal weight approximation algorithm called LARICL (Linear Approximation of Reweighted In-context Learning). This algorithm requires minimal training cost while providing effective results. We prove the convergence of our algorithm and validate its performance through experiments conducted on a numerical dataset. The experimental findings reveal a substantial improvement in comparison to benchmarks including the performance of casual prompt-based in-context learning and the performance of a classic fine-tuning method.
    The Blame Problem in Evaluating Local Explanations, and How to Tackle it. (arXiv:2310.03466v1 [cs.LG])
    The number of local model-agnostic explanation techniques proposed has grown rapidly recently. One main reason is that the bar for developing new explainability techniques is low due to the lack of optimal evaluation measures. Without rigorous measures, it is hard to have concrete evidence of whether the new explanation techniques can significantly outperform their predecessors. Our study proposes a new taxonomy for evaluating local explanations: robustness, evaluation using ground truth from synthetic datasets and interpretable models, model randomization, and human-grounded evaluation. Using this proposed taxonomy, we highlight that all categories of evaluation methods, except those based on the ground truth from interpretable models, suffer from a problem we call the "blame problem." In our study, we argue that this category of evaluation measure is a more reasonable method for evaluating local model-agnostic explanations. However, we show that even this category of evaluation measures has further limitations. The evaluation of local explanations remains an open research problem.
    Certifiably Robust Graph Contrastive Learning. (arXiv:2310.03312v1 [cs.CR])
    Graph Contrastive Learning (GCL) has emerged as a popular unsupervised graph representation learning method. However, it has been shown that GCL is vulnerable to adversarial attacks on both the graph structure and node attributes. Although empirical approaches have been proposed to enhance the robustness of GCL, the certifiable robustness of GCL is still remain unexplored. In this paper, we develop the first certifiably robust framework in GCL. Specifically, we first propose a unified criteria to evaluate and certify the robustness of GCL. We then introduce a novel technique, RES (Randomized Edgedrop Smoothing), to ensure certifiable robustness for any GCL model, and this certified robustness can be provably preserved in downstream tasks. Furthermore, an effective training method is proposed for robust GCL. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed method in providing effective certifiable robustness and enhancing the robustness of any GCL model. The source code of RES is available at https://github.com/ventr1c/RES-GCL.
    SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. (arXiv:2310.03684v1 [cs.LG])
    Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.
    BTDNet: a Multi-Modal Approach for Brain Tumor Radiogenomic Classification. (arXiv:2310.03485v1 [eess.IV])
    Brain tumors pose significant health challenges worldwide, with glioblastoma being one of the most aggressive forms. Accurate determination of the O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is crucial for personalized treatment strategies. However, traditional methods are labor-intensive and time-consuming. This paper proposes a novel multi-modal approach, BTDNet, leveraging multi-parametric MRI scans, including FLAIR, T1w, T1wCE, and T2 3D volumes, to predict MGMT promoter methylation status. BTDNet addresses two main challenges: the variable volume lengths (i.e., each volume consists of a different number of slices) and the volume-level annotations (i.e., the whole 3D volume is annotated and not the independent slices that it consists of). BTDNet consists of four components: i) the data augmentation one (that performs geometric transformations, convex combinations of data pairs and test-time data augmentation); ii) the 3D analysis one (that performs global analysis through a CNN-RNN); iii) the routing one (that contains a mask layer that handles variable input feature lengths), and iv) the modality fusion one (that effectively enhances data representation, reduces ambiguities and mitigates data scarcity). The proposed method outperforms by large margins the state-of-the-art methods in the RSNA-ASNR-MICCAI BraTS 2021 Challenge, offering a promising avenue for enhancing brain tumor diagnosis and treatment.
    Otago Exercises Monitoring for Older Adults by a Single IMU and Hierarchical Machine Learning Models. (arXiv:2310.03512v1 [cs.LG])
    Otago Exercise Program (OEP) is a rehabilitation program for older adults to improve frailty, sarcopenia, and balance. Accurate monitoring of patient involvement in OEP is challenging, as self-reports (diaries) are often unreliable. With the development of wearable sensors, Human Activity Recognition (HAR) systems using wearable sensors have revolutionized healthcare. However, their usage for OEP still shows limited performance. The objective of this study is to build an unobtrusive and accurate system to monitor OEP for older adults. Data was collected from older adults wearing a single waist-mounted Inertial Measurement Unit (IMU). Two datasets were collected, one in a laboratory setting, and one at the homes of the patients. A hierarchical system is proposed with two stages: 1) using a deep learning model to recognize whether the patients are performing OEP or activities of daily life (ADLs) using a 10-minute sliding window; 2) based on stage 1, using a 6-second sliding window to recognize the OEP sub-classes performed. The results showed that in stage 1, OEP could be recognized with window-wise f1-scores over 0.95 and Intersection-over-Union (IoU) f1-scores over 0.85 for both datasets. In stage 2, for the home scenario, four activities could be recognized with f1-scores over 0.8: ankle plantarflexors, abdominal muscles, knee bends, and sit-to-stand. The results showed the potential of monitoring the compliance of OEP using a single IMU in daily life. Also, some OEP sub-classes are possible to be recognized for further analysis.
    Variational Inference for GARCH-family Models. (arXiv:2310.03435v1 [stat.ML])
    The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.
    Explaining Emergent In-Context Learning as Kernel Regression. (arXiv:2305.12766v2 [cs.CL] UPDATED)
    Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs acquire such capabilities. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training on a general language corpus by proposing one hypothesis that LLMs can simulate kernel regression with internal representations when faced with in-context examples. More concretely, we first prove that Bayesian inference on in-context prompts can be asymptotically understood as kernel regression $\hat y = \sum_i y_i K(x, x_i)/\sum_i K(x, x_i)$ as the number of in-context demonstrations grows. Then, we empirically investigate the in-context behaviors of language models. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression. Finally, our theory provides insights into multiple phenomena observed in the ICL field: why retrieving demonstrative samples similar to test samples can help, why ICL performance is sensitive to the output formats, and why ICL accuracy benefits from selecting in-distribution and representative samples.
    Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses. (arXiv:2310.03311v1 [cs.LG])
    Variational dimensionality reduction methods are known for their high accuracy, generative abilities, and robustness. These methods have many theoretical justifications. Here we introduce a unifying principle rooted in information theory to rederive and generalize existing variational methods and design new ones. We base our framework on an interpretation of the multivariate information bottleneck, in which two Bayesian networks are traded off against one another. We interpret the first network as an encoder graph, which specifies what information to keep when compressing the data. We interpret the second network as a decoder graph, which specifies a generative model for the data. Using this framework, we rederive existing dimensionality reduction methods such as the deep variational information bottleneck (DVIB), beta variational auto-encoders (beta-VAE), and deep variational canonical correlation analysis (DVCCA). The framework naturally introduces a trade-off parameter between compression and reconstruction in the DVCCA family of algorithms, resulting in the new beta-DVCCA family. In addition, we derive a new variational dimensionality reduction method, deep variational symmetric informational bottleneck (DVSIB), which simultaneously compresses two variables to preserve information between their compressed representations. We implement all of these algorithms and evaluate their ability to produce shared low dimensional latent spaces on a modified noisy MNIST dataset. We show that algorithms that are better matched to the structure of the data (beta-DVCCA and DVSIB) produce better latent spaces as measured by classification accuracy and the dimensionality of the latent variables. We believe that this framework can be used to unify other multi-view representation learning algorithms. Additionally, it provides a straightforward framework for deriving problem-specific loss functions.
    Memory Capacity of Recurrent Neural Networks with Matrix Representation. (arXiv:2104.07454v3 [cs.LG] UPDATED)
    It is well known that canonical recurrent neural networks (RNNs) face limitations in learning long-term dependencies which have been addressed by memory structures in long short-term memory (LSTM) networks. Neural Turing machines (NTMs) are novel RNNs that implement the notion of programmable computers with neural network controllers that can learn simple algorithmic tasks. Matrix neural networks feature matrix representation which inherently preserves the spatial structure of data when compared to canonical neural networks that use vector-based representation. One may then argue that neural networks with matrix representations may have the potential to provide better memory capacity. In this paper, we define and study a probabilistic notion of memory capacity based on Fisher information for matrix-based RNNs. We find bounds on memory capacity for such networks under various hypotheses and compare them with their vector counterparts. In particular, we show that the memory capacity of such networks is bounded by $N^2$ for $N\times N$ state matrix which generalizes the one known for vector networks. We also show and analyze the increase in memory capacity for such networks which is introduced when one exhibits an external state memory, such as NTMs. Consequently, we construct NTMs with RNN controllers with matrix-based representation of external memory, leading us to introduce Matrix NTMs. We demonstrate the performance of this class of memory networks under certain algorithmic learning tasks such as copying and recall and compare it with Matrix RNNs. We find an improvement in the performance of Matrix NTMs by the addition of external memory, in comparison to Matrix RNNs.
    Distributional PAC-Learning from Nisan's Natural Proofs. (arXiv:2310.03641v1 [cs.CC])
    (Abridged) Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for \Lambda imply efficient algorithms for learning \Lambda-circuits, but only over the uniform distribution, with membership queries, and provided \AC^0[p] \subseteq \Lambda. We consider whether this implication can be generalized to \Lambda \not\supseteq \AC^0[p], and to learning algorithms in Valiant's PAC model, which use only random examples and learn over arbitrary example distributions. We give results of both positive and negative flavor. On the negative side, we observe that if, for every circuit class \Lambda, the implication from natural proofs for \Lambda to learning \Lambda-circuits in Valiant's PAC model holds, then there is a polynomial time solution to O(n^{1.5})-uSVP (unique Shortest Vector Problem), and polynomial time quantum solutions to O(n^{1.5})-SVP (Shortest Vector Problem) and O(n^{1.5})-SIVP (Shortest Independent Vector Problem). This indicates that whether natural proofs for \Lambda imply efficient learning algorithms for \Lambda in Valiant's PAC model may depend on \Lambda. On the positive side, our main result is that specific natural proofs arising from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits) imply PAC-learning algorithms in a new distributional variant of Valiant's model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being boosting-friendly. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.
    Over-the-Air Federated Learning with Compressed Sensing: Is Sparsification Necessary?. (arXiv:2310.03410v1 [cs.IT])
    Over-the-Air (OtA) Federated Learning (FL) refers to an FL system where multiple agents apply OtA computation for transmitting model updates to a common edge server. Two important features of OtA computation, namely linear processing and signal-level superposition, motivate the use of linear compression with compressed sensing (CS) methods to reduce the number of data samples transmitted over the channel. The previous works on applying CS methods in OtA FL have primarily assumed that the original model update vectors are sparse, or they have been sparsified before compression. However, it is unclear whether linear compression with CS-based reconstruction is more effective than directly sending the non-zero elements in the sparsified update vectors, under the same total power constraint. In this study, we examine and compare several communication designs with or without sparsification. Our findings demonstrate that sparsification before compression is not necessary. Alternatively, sparsification without linear compression can also achieve better performance than the commonly considered setup that combines both.
    FASER: Binary Code Similarity Search through the use of Intermediate Representations. (arXiv:2310.03605v1 [cs.CR])
    Being able to identify functions of interest in cross-architecture software is useful whether you are analysing for malware, securing the software supply chain or conducting vulnerability research. Cross-Architecture Binary Code Similarity Search has been explored in numerous studies and has used a wide range of different data sources to achieve its goals. The data sources typically used draw on common structures derived from binaries such as function control flow graphs or binary level call graphs, the output of the disassembly process or the outputs of a dynamic analysis approach. One data source which has received less attention is binary intermediate representations. Binary Intermediate representations possess two interesting properties: they are cross architecture by their very nature and encode the semantics of a function explicitly to support downstream usage. Within this paper we propose Function as a String Encoded Representation (FASER) which combines long document transformers with the use of intermediate representations to create a model capable of cross architecture function search without the need for manual feature engineering, pre-training or a dynamic analysis step. We compare our approach against a series of baseline approaches for two tasks; A general function search task and a targeted vulnerability search task. Our approach demonstrates strong performance across both tasks, performing better than all baseline approaches.
    LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers. (arXiv:2310.03294v1 [cs.LG])
    Increasing the context length of large language models (LLMs) unlocks fundamentally new capabilities, but also significantly increases the memory footprints of training. Previous model-parallel systems such as Megatron-LM partition and compute different attention heads in parallel, resulting in large communication volumes, so they cannot scale beyond the number of attention heads, thereby hindering its adoption. In this paper, we introduce a new approach, LightSeq, for long-context LLMs training. LightSeq has many notable advantages. First, LightSeq partitions over the sequence dimension, hence is agnostic to model architectures and readily applicable for models with varying numbers of attention heads, such as Multi-Head, Multi-Query and Grouped-Query attention. Second, LightSeq not only requires up to 4.7x less communication than Megatron-LM on popular LLMs but also overlaps the communication with computation. To further reduce the training time, LightSeq features a novel gradient checkpointing scheme to bypass an forward computation for memory-efficient attention. We evaluate LightSeq on Llama-7B and its variants with sequence lengths from 32K to 512K. Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1.24-2.01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM. Codes will be available at https://github.com/RulinShao/LightSeq.
    Neural architecture impact on identifying temporally extended Reinforcement Learning tasks. (arXiv:2310.03161v1 [cs.LG])
    Inspired by recent developments in attention models for image classification and natural language processing, we present various Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep Reinforcement learning techniques in various fields like robotics, gaming and healthcare, they suffer from a major drawback that neural networks are difficult to interpret. We try to get around this problem with the help of Attention based models. In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions and easier interpretation of logic behind the chosen actions. Our models in addition to playing well on gym-Atari environments, also provide insights on how agent perceives its environment. In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too. Compared to previous works in Vision Transformer, our model is faster to train and requires fewer computational resources. 3
    SAF: Smart Aggregation Framework for Revealing Atoms Importance Rank and Improving Prediction Rates in Drug Discovery. (arXiv:2310.03028v1 [physics.chem-ph])
    Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening a large chemical space in silico. A successful approach for representing molecules is to treat them as a graph and utilize graph neural networks. One of the key limitations of such methods is the necessity to represent compounds with different numbers of atoms, which requires aggregating the atom's information. Common aggregation operators, such as averaging, result in loss of information at the atom level. In this work, we propose a novel aggregating approach where each atom is weighted non-linearly using the Boltzmann distribution with a hyperparameter analogous to temperature. We show that using this weighted aggregation improves the ability of the gold standard message-passing neural network to predict antibiotic activity. Moreover, by changing the temperature hyperparameter, our approach can reveal the atoms that are important for activity prediction in a smooth and consistent way, thus providing a novel, regulated attention mechanism for graph neural networks. We further validate our method by showing that it recapitulates the functional group in beta-Lactam antibiotics. The ability of our approach to rank the atoms' importance for a desired function can be used within any graph neural network to provide interpretability of the results and predictions at the node level.
    Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization. (arXiv:2310.03234v1 [math.OC])
    This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.
    Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution. (arXiv:2310.03032v1 [cs.IR])
    Embedding plays a critical role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate part, SEvo is able to directly inject the graph structure information into embedding with negligible computational overhead in training. The convergence properties of SEvo as well as its possible variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. In particular, SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structure information beyond explicit GNN modules.
    Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models. (arXiv:2310.03123v1 [cs.LG])
    With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fraction of their parameters, challenging for many practitioners. (2) model privacy: existing PTMs often function as public API services, with their parameters inaccessible for effective or tailored fine-tuning. (3) data privacy: the fine-tuning of PTMs necessitates high-quality datasets, which are typically localized and not shared to public. To optimally harness each local dataset while navigating memory constraints and preserving privacy, we propose Federated Black-Box Prompt Tuning (Fed-BBPT). This innovative approach eschews reliance on parameter architectures and private dataset access, instead capitalizing on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Local users leverage API-driven learning via a zero-order optimizer, obviating the need for PTM deployment. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. A thorough evaluation across 40 datasets spanning CV and NLP tasks underscores the robustness of our proposed model.
    Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly. (arXiv:2310.03150v1 [cs.LG])
    Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
    Digital Ethics in Federated Learning. (arXiv:2310.03178v1 [cs.LG])
    The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potential in privacy preservation and learning efficiency enhancement. In this paper, we highlight the digital ethics concerns that arise when human-centric devices serve as clients in FL. More specifically, challenges of game dynamics, fairness, incentive, and continuity arise in FL due to differences in perspectives and objectives between clients and the server. We analyze these challenges and their solutions from the perspectives of both the client and the server, and through the viewpoints of centralized and decentralized FL. Finally, we explore the opportunities in FL for human-centric IoT as directions for future development.
    UniPredict: Large Language Models are Universal Tabular Predictors. (arXiv:2310.03266v1 [cs.LG])
    Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we show that scaling up an LLM to extensive tabular datasets with the capability of comprehending diverse tabular inputs and predicting for target variables following the input instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an advantage over other models, ranging from 5.4% to 13.4%, when compared with the best tree-boosting baseline and the best neural network baseline, respectively. We further test UniPredict in few-shot learning settings on another 62 tabular datasets. Our method achieves strong performance in quickly adapting to new tasks, where our method outperforms XGBoost over 100% on the low-resource setup and shows a significant margin over all baselines. We envision that UniPredict sheds light on developing a universal tabular data prediction system that learns from data at scale and serves a wide range of prediction tasks.
    OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials. (arXiv:2310.03121v1 [physics.chem-ph])
    Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.
    Multi-Task Learning For Reduced Popularity Bias In Multi-Territory Video Recommendations. (arXiv:2310.03148v1 [cs.IR])
    Various data imbalances that naturally arise in a multi-territory personalized recommender system can lead to a significant item bias for globally prevalent items. A locally popular item can be overshadowed by a globally prevalent item. Moreover, users' viewership patterns/statistics can drastically change from one geographic location to another which may suggest to learn specific user embeddings. In this paper, we propose a multi-task learning (MTL) technique, along with an adaptive upsampling method to reduce popularity bias in multi-territory recommendations. Our proposed framework is designed to enrich training examples with active users representation through upsampling, and capable of learning geographic-based user embeddings by leveraging MTL. Through experiments, we demonstrate the effectiveness of our framework in multiple territories compared to a baseline not incorporating our proposed techniques.~Noticeably, we show improved relative gain of up to $65.27\%$ in PR-AUC metric. A case study is presented to demonstrate the advantages of our methods in attenuating the popularity bias of global items.
    Deep Learning in Computational Biology: Advancements, Challenges, and Future Outlook. (arXiv:2310.03086v1 [cs.LG])
    Deep learning has become a powerful tool in computational biology, revolutionising the analysis and interpretation of biological data over time. In our article review, we delve into various aspects of deep learning in computational biology. Specifically, we examine its history, advantages, and challenges. Our focus is on two primary applications: DNA sequence classification and prediction, as well as protein structure prediction from sequence data. Additionally, we provide insights into the outlook for this field. To fully harness the potential of deep learning in computational biology, it is crucial to address the challenges that come with it. These challenges include the requirement for large, labelled datasets and the interpretability of deep learning models. The use of deep learning in the analysis of DNA sequences has brought about a significant transformation in the detection of genomic variants and the analysis of gene expression. This has greatly contributed to the advancement of personalised medicine and drug discovery. Convolutional neural networks (CNNs) have been shown to be highly accurate in predicting genetic variations and gene expression levels. Deep learning techniques are used for analysing epigenetic data, including DNA methylation and histone modifications. This provides valuable insights into metabolic conditions and gene regulation. The field of protein structure prediction has been significantly impacted by deep learning, which has enabled accurate determination of the three-dimensional shape of proteins and prediction of their interactions. The future of deep learning in computational biology looks promising. With the development of advanced deep learning models and interpretation techniques, there is potential to overcome current challenges and further our understanding of biological systems.
    Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions. (arXiv:2310.03195v1 [cs.LG])
    Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. This optimization leads to reduced operational costs, improved customer demand fulfillment, and enhanced production efficiency. However, machine scheduling remains a challenging combinatorial problem due to its NP-hard nature. Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics. Researchers have explored applying DRL to machine scheduling problems since 1995. This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations. It categorizes these approaches based on computational components: conventional neural networks, encoder-decoder architectures, graph neural networks, and metaheuristic algorithms. Our review concludes that DRL-based methods outperform exact solvers, heuristics, and tabular reinforcement learning algorithms in terms of computation speed and generating near-global optimal solutions. These DRL-based approaches have been successfully applied to static and dynamic scheduling across diverse machine environments and job characteristics. However, DRL-based schedulers face limitations in handling complex operational constraints, configurable multi-objective optimization, generalization, scalability, interpretability, and robustness. Addressing these challenges will be a crucial focus for future research in this field. This paper serves as a valuable resource for researchers to assess the current state of DRL-based machine scheduling and identify research gaps. It also aids experts and practitioners in selecting the appropriate DRL approach for production scheduling.
    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC. (arXiv:2310.03218v1 [cs.LG])
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts
    Attributing Learned Concepts in Neural Networks to Training Data. (arXiv:2310.03149v1 [cs.LG])
    By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this, we combine data attribution methods with methods for probing the concepts learned by a model. Training network and probe ensembles for two concept datasets on a range of network layers, we use the recently developed TRAK method for large-scale data attribution. We find some evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that rather than being highly dependent on a few specific examples, the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
    Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data. (arXiv:2310.03146v1 [cs.LG])
    Traditional deep learning (DL) suffers from two core problems. Firstly, it assumes training samples are independent and identically distributed. However, numerous real-world datasets group samples by shared measurements (e.g., study participants or cells), violating this assumption. In these scenarios, DL can show compromised performance, limited generalization, and interpretability issues, coupled with cluster confounding causing Type 1 and 2 errors. Secondly, models are typically trained for overall accuracy, often neglecting underrepresented groups and introducing biases in crucial areas like loan approvals or determining health insurance rates, such biases can significantly impact one's quality of life. To address both of these challenges simultaneously, we present a mixed effects deep learning (MEDL) framework. MEDL separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through the introduction of: 1) a cluster adversary which encourages the learning of cluster-invariant FE, 2) a Bayesian neural network which quantifies the RE, and a mixing function combining the FE an RE into a mixed-effect prediction. We marry this MEDL with adversarial debiasing, which promotes equality-of-odds fairness across FE, RE, and ME predictions for fairness-sensitive variables. We evaluated our approach using three datasets: two from census/finance focusing on income classification and one from healthcare predicting hospitalization duration, a regression task. Our framework notably enhances fairness across all sensitive variables-increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status. Besides promoting fairness, our method maintains the robust performance and clarity of MEDL. It's versatile, suitable for various dataset types and tasks, making it broadly applicable. Our GitHub repository houses the implementation.
    Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models. (arXiv:2310.03182v1 [cs.CV])
    Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.
    Context-Based Tweet Engagement Prediction. (arXiv:2310.03147v1 [cs.IR])
    Twitter is currently one of the biggest social media platforms. Its users may share, read, and engage with short posts called tweets. For the ACM Recommender Systems Conference 2020, Twitter published a dataset around 70 GB in size for the annual RecSys Challenge. In 2020, the RecSys Challenge invited participating teams to create models that would predict engagement likelihoods for given user-tweet combinations. The submitted models predicting like, reply, retweet, and quote engagements were evaluated based on two metrics: area under the precision-recall curve (PRAUC) and relative cross-entropy (RCE). In this diploma thesis, we used the RecSys 2020 Challenge dataset and evaluation procedure to investigate how well context alone may be used to predict tweet engagement likelihood. In doing so, we employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines. We manually created just under 200 additional features to describe tweet context. The results indicate that features describing users' prior engagement history and the popularity of hashtags and links in the tweet were the most informative. We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results. After comparing the best results of our context-only prediction models with content-only models and with models developed by the Challenge winners, we identified that the context-based models underperformed in terms of the RCE score. This work thus concludes by situating this discrepancy and proposing potential improvements to our implementation, which is shared in a public git repository.
    Untargeted White-box Adversarial Attack with Heuristic Defence Methods in Real-time Deep Learning based Network Intrusion Detection System. (arXiv:2310.03334v1 [cs.LG])
    Network Intrusion Detection System (NIDS) is a key component in securing the computer network from various cyber security threats and network attacks. However, consider an unfortunate situation where the NIDS is itself attacked and vulnerable more specifically, we can say, How to defend the defender?. In Adversarial Machine Learning (AML), the malicious actors aim to fool the Machine Learning (ML) and Deep Learning (DL) models to produce incorrect predictions with intentionally crafted adversarial examples. These adversarial perturbed examples have become the biggest vulnerability of ML and DL based systems and are major obstacles to their adoption in real-time and mission-critical applications such as NIDS. AML is an emerging research domain, and it has become a necessity for the in-depth study of adversarial attacks and their defence strategies to safeguard the computer network from various cyber security threads. In this research work, we aim to cover important aspects related to NIDS, adversarial attacks and its defence mechanism to increase the robustness of the ML and DL based NIDS. We implemented four powerful adversarial attack techniques, namely, Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) in NIDS. We analyzed its performance in terms of various performance metrics in detail. Furthermore, the three heuristics defence strategies, i.e., Adversarial Training (AT), Gaussian Data Augmentation (GDA) and High Confidence (HC), are implemented to improve the NIDS robustness under adversarial attack situations. The complete workflow is demonstrated in real-time network with data packet flow. This research work provides the overall background for the researchers interested in AML and its implementation from a computer network security point of view.
    Dual Prompt Tuning for Domain-Aware Federated Learning. (arXiv:2310.03103v1 [cs.LG])
    Federated learning is a distributed machine learning paradigm that allows multiple clients to collaboratively train a shared model with their local data. Nonetheless, conventional federated learning algorithms often struggle to generalize well due to the ubiquitous domain shift across clients. In this work, we consider a challenging yet realistic federated learning scenario where the training data of each client originates from different domains. We address the challenges of domain shift by leveraging the technique of prompt learning, and propose a novel method called Federated Dual Prompt Tuning (Fed-DPT). Specifically, Fed-DPT employs a pre-trained vision-language model and then applies both visual and textual prompt tuning to facilitate domain adaptation over decentralized data. Extensive experiments of Fed-DPT demonstrate its significant effectiveness in domain-aware federated learning. With a pre-trained CLIP model (ViT-Base as image encoder), the proposed Fed-DPT attains 68.4% average accuracy over six domains in the DomainNet dataset, which improves the original CLIP by a large margin of 14.8%.
  • Open

    Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities. (arXiv:2310.03696v1 [stat.ML])
    We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
    Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective. (arXiv:2310.03556v1 [stat.ML])
    Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
    On Convergence of Federated Averaging Langevin Dynamics. (arXiv:2112.05120v4 [stat.ML] UPDATED)
    We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.
    Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance. (arXiv:2310.03722v1 [math.ST])
    In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.
    A Probabilistic Graph Coupling View of Dimension Reduction. (arXiv:2201.13053v3 [math.PR] UPDATED)
    Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods suffer from a statistical deficiency that explains poor performances in conserving coarse-grain dependencies. Our model is leveraged and extended to address this issue while new links are drawn with Laplacian eigenmaps and PCA.
    CLEVRER-Humans: Describing Physical and Causal Events the Human Way. (arXiv:2310.03635v1 [cs.AI])
    Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
    High-dimensional Bayesian Optimization with Group Testing. (arXiv:2310.03515v1 [cs.LG])
    Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
    Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation. (arXiv:2310.03112v1 [stat.ML])
    Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
    Sharpness-Aware Minimization and the Edge of Stability. (arXiv:2309.12488v3 [cs.LG] UPDATED)
    Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v4 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    Sparse Deep Learning for Time Series Data: Theory and Applications. (arXiv:2310.03243v1 [stat.ML])
    Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.
    Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein. (arXiv:2310.03398v1 [cs.LG])
    We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers. (arXiv:2304.00195v3 [stat.ML] UPDATED)
    An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from extraneous features about individual objects. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where modest but consistent improvements in performance and sample efficiency are observed.
    Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. (arXiv:2310.03530v1 [cs.LG])
    The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
    Non-Asymptotic Analysis of Ensemble Kalman Updates: Effective Dimension and Localization. (arXiv:2208.03246v3 [stat.ML] UPDATED)
    Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest.
    SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. (arXiv:2310.03684v1 [cs.LG])
    Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.
    Unpaired Image-to-Image Translation via Neural Schr\"odinger Bridge. (arXiv:2305.15086v2 [cs.CV] UPDATED)
    Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. Although diffusion models have achieved remarkable progress in recent years, they have limitations in the unpaired image-to-image translation tasks due to the Gaussian prior assumption. Schr\"odinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. However, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose the Unpaired Neural Schr\"odinger Bridge (UNSB), which expresses SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We demonstrate that UNSB is scalable and successfully solves various unpaired image-to-image translation tasks. Code: \url{https://github.com/cyclomon/UNSB}
    Deep Momentum Multi-Marginal Schr\"odinger Bridge. (arXiv:2303.01751v3 [stat.ML] UPDATED)
    It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schr\"odinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chr\"odinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible.
    Optimal 1-Wasserstein Distance for WGANs. (arXiv:2201.02824v2 [stat.ML] UPDATED)
    The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.
    Towards Inferential Reproducibility of Machine Learning Research. (arXiv:2302.04054v6 [cs.LG] UPDATED)
    Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.
    Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance. (arXiv:2302.11024v5 [stat.ML] UPDATED)
    Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of probability measures, may also be identified; particle approximations of these mean-field models form the basis of algorithms. The gradient flow approach is also the basis of algorithms for variational inference, in which the optimization is performed over a parameterized family of probability distributions such as Gaussians, and the underlying gradient flow is restricted to the parameterized family. By choosing different energy functionals and metrics for the gradient flow, different algorithms with different convergence properties arise. In this paper, we concentrate on the Kullback-Leibler divergence after showing that, up to scaling, it has the unique property that the gradient flows resulting from this choice of energy do not depend on the normalization constant. For the metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein metrics; we introduce the affine invariance property for gradient flows, and their corresponding mean-field models, determine whether a given metric leads to affine invariance, and modify it to make it affine invariant if it does not. We study the resulting gradient flows in both probability density space and Gaussian space. The flow in the Gaussian space may be understood as a Gaussian approximation of the flow. We demonstrate that the Gaussian approximation based on the metric and through moment closure coincide, establish connections between them, and study their long-time convergence properties showing the advantages of affine invariance.
    Stochastic interpolants with data-dependent couplings. (arXiv:2310.03725v1 [cs.LG])
    Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
    Maximum Likelihood Estimation of Latent Variable Structural Equation Models: A Neural Network Approach. (arXiv:2309.14073v2 [stat.ML] UPDATED)
    We propose a graphical structure for structural equation models that is stable under marginalization under linearity and Gaussianity assumptions. We show that computing the maximum likelihood estimation of this model is equivalent to training a neural network. We implement a GPU-based algorithm that computes the maximum likelihood estimation of these models.  ( 2 min )
    Assessment of the Reliablity of a Model's Decision by Generalizing Attribution to the Wavelet Domain. (arXiv:2305.14979v3 [cs.CV] UPDATED)
    Neural networks have shown remarkable performance in computer vision, but their deployment in numerous scientific and technical fields is challenging due to their black-box nature. Scientists and practitioners need to evaluate the reliability of a decision, i.e., to know simultaneously if a model relies on the relevant features and whether these features are robust to image corruptions. Existing attribution methods aim to provide human-understandable explanations by highlighting important regions in the image domain, but fail to fully characterize a decision process's reliability. To bridge this gap, we introduce the Wavelet sCale Attribution Method (WCAM), a generalization of attribution from the pixel domain to the space-scale domain using wavelet transforms. Attribution in the wavelet domain reveals where {\it and} on what scales the model focuses, thus enabling us to assess whether a decision is reliable.  ( 3 min )
    Characterization of causal ancestral graphs for time series with latent confounders. (arXiv:2112.08417v2 [stat.ME] UPDATED)
    In this paper, we introduce a novel class of graphical models for representing time lag specific causal relationships and independencies of multivariate time series with unobserved confounders. We completely characterize these graphs and show that they constitute proper subsets of the currently employed model classes. As we show, from the novel graphs one can thus draw stronger causal inferences -- without additional assumptions. We further introduce a graphical representation of Markov equivalence classes of the novel graphs. This graphical representation contains more causal knowledge than what current state-of-the-art causal discovery algorithms learn.  ( 2 min )
    Network Cascade Vulnerability using Constrained Bayesian Optimization. (arXiv:2304.14420v2 [cs.SI] UPDATED)
    Measures of power grid vulnerability are often assessed by the amount of damage an adversary can exact on the network. However, the cascading impact of such attacks is often overlooked, even though cascades are one of the primary causes of large-scale blackouts. This paper explores modifications of transmission line protection settings as candidates for adversarial attacks, which can remain undetectable as long as the network equilibrium state remains unaltered. This forms the basis of a black-box function in a Bayesian optimization procedure, where the objective is to find protection settings that maximize network degradation due to cascading. Notably, our proposed method is agnostic to the choice of the cascade simulator and its underlying assumptions. Numerical experiments reveal that, against conventional wisdom, maximally misconfiguring the protection settings of all network lines does not cause the most cascading. More surprisingly, even when the degree of misconfiguration is limited due to resource constraints, it is still possible to find settings that produce cascades comparable in severity to instances where there are no resource constraints.  ( 2 min )
    A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling. (arXiv:2310.03298v1 [stat.ML])
    Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.  ( 3 min )
    Learning Robust Statistics for Simulation-based Inference under Model Misspecification. (arXiv:2305.15871v3 [stat.ML] UPDATED)
    Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.  ( 2 min )
    Rethinking Fairness for Human-AI Collaboration. (arXiv:2310.03647v1 [cs.LG])
    Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.  ( 2 min )
    Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. (arXiv:2310.03529v1 [cs.LG])
    We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.  ( 2 min )
    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC. (arXiv:2310.03218v1 [cs.LG])
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts  ( 2 min )
    On the Implicit Bias of Adam. (arXiv:2309.00079v3 [cs.LG] UPDATED)
    In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, on the contrary, hinder its decrease (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.  ( 2 min )
    Analysis of learning a flow-based generative model from limited sample complexity. (arXiv:2310.03575v1 [stat.ML])
    We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.  ( 2 min )
    Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection. (arXiv:2307.07726v2 [stat.ML] UPDATED)
    When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which offers a perspective distinct from prior research. Specifically, we explore the rationale underlying a common practice during the construction of neural network models: sample splitting. Our findings indicate that the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.  ( 2 min )
    Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods. (arXiv:2310.02671v1 [math.OC] CROSS LISTED)
    Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.  ( 2 min )
    Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models. (arXiv:2310.03546v1 [stat.ML])
    Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.  ( 2 min )
    Molecule Design by Latent Prompt Transformer. (arXiv:2310.03253v1 [cs.LG])
    This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.  ( 2 min )
    Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization. (arXiv:2310.03234v1 [math.OC])
    This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.  ( 2 min )
    Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel. (arXiv:2310.03054v1 [stat.ML])
    We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.  ( 2 min )
    Sampling via Gradient Flows in the Space of Probability Measures. (arXiv:2310.03597v1 [stat.ML])
    Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.  ( 3 min )
    Variational Inference for GARCH-family Models. (arXiv:2310.03435v1 [stat.ML])
    The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.  ( 2 min )
    Demystifying Oversmoothing in Attention-Based Graph Neural Networks. (arXiv:2305.16102v2 [cs.LG] UPDATED)
    Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.  ( 2 min )

  • Open

    [D] What exactly does base multimodal mean?
    I here a lot of people say that models like flamingo and Idefics aren't really multimodal, that they just use clip models to give text captions to the transformer, that there not "base multimodal" what exactly does it mean? Is there a way to directly tokenize images to transformers? Are there major architectural changes, if so, how would they differ from GPT-2? submitted by /u/vatsadev [link] [comments]
    [R] AutoAgents: A Framework for Automatic Agent Generation - Peking University 2023 - Generates the for the task necessary amount of different Agents that are also able to use tools in their work!
    Paper: https://arxiv.org/abs/2309.17288v1 Github: https://github.com/LinkSoul-AI/AutoAgents Abstract: Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Specifically, AutoAgents couples the relationship between tasks and roles by dynamically generating multiple required agents based on task content and planning solutions for the current task based on the generated expert agents. Multiple specialized agents collaborate with each other to efficiently accomplish tasks. Concurrently, an observer role is incorporated into the framework to reflect on the designated plans and agents' responses and improve upon them. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods. This underscores the significance of assigning different roles to different tasks and of team cooperation, offering new perspectives for tackling complex tasks. https://preview.redd.it/2jmnr73kymsb1.jpg?width=1663&format=pjpg&auto=webp&s=08f53d5da3d12e685c5d4b24f27628d880a917c1 https://preview.redd.it/jklyr73kymsb1.jpg?width=824&format=pjpg&auto=webp&s=6f69b2fc5ef4bda60553da0bb953bd3c07ad506b https://preview.redd.it/elatla3kymsb1.jpg?width=1029&format=pjpg&auto=webp&s=e7e508cedd17b4798c9f90bf1c089beff3042f4a ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [Project] LoRA from Scratch
    Hi there! I was interested in learning more about LoRA but I was having a hard time finding a good simple example of implementing LoRA, as most sources are training large models and use a combination of huggingface transformers and the loralib package the original LoRA authors wrote. As a result, I ended up writing a simple LoRA implementation from scratch in pytorch lightning, and I figured other people might find it helpful as a learning resource or springboard: https://github.com/sunildkumar/lora_from_scratch/tree/main submitted by /u/dragseon [link] [comments]  ( 9 min )
    [P] Tutorial: Benchmarking Bark text-to-speech on 26 consumer GPUs - Reading out 144K recipes
    In this project, we benchmarked Bark text-to-speech across 26 different consumer GPUs. The goal: To get Bark to read 144K food recipes from Food.com's recipe dataset. You can read the full tutorial here: https://blog.salad.com/bark-benchmark-text-to-speech/ Included: Architecture diagram, data preparation, inference server setup, queue worker, setting up container group & compiling the results Code-blocks included in the tutorial. Words per dollar for each GPU: https://preview.redd.it/6daqluu3omsb1.png?width=2000&format=png&auto=webp&s=bc4b74fe6ee80c2721ab324eb0d9a2d7c2f7ddb1 Although the latest cards are indeed much faster than older cards at performing the inference, there’s really a sweet spot for cost-performance in the lower end 30xx series cards. Conclusions As is often the case, there’s a clear trade-off here between cost and performance. Higher end cards are faster, but their disproportionate cost makes them more expensive per word spoken. The model’s median speed is surprisingly similar across GPU types, even though the peak performance can be quite different. Salad has a lot of RTX 3060 GPUs available, based on their relatively low speed, yet huge number of inferences performed over the test. No matter what GPU you select, you should be prepared for significant variability in performance. Qualitative: While bark’s speech is often impressively natural sounding, it does have a tendency to go off script sometimes. We’ve also made available audio from 1000 top-rated recipes, paired with the script it was trying to read. submitted by /u/SaladChefs [link] [comments]  ( 9 min )
    [R] Brown University Paper: Low-Resource Languages (Zulu, Scots Gaelic, Hmong, Guarani) Can Easily Jailbreak LLMs
    Researchers from Brown University presented a new study supporting that translating unsafe prompts into `low-resource languages` allows them to easily bypass safety measures in LLMs. By converting English inputs like "how to steal without getting caught" into Zulu and feeding to GPT-4, harmful responses slipped through 80% of the time. English prompts were blocked over 99% of the time, for comparison. The study benchmarked attacks across 12 diverse languages and categories: High-resource: English, Chinese, Arabic, Hindi Mid-resource: Ukrainian, Bengali, Thai, Hebrew Low-resource: Zulu, Scots Gaelic, Hmong, Guarani The low-resource languages showed serious vulnerability to generating harmful responses, with combined attack success rates of around 79%. Mid-resource language success rates were much lower at 22%, while high-resource languages showed minimal vulnerability at around 11% success. Attacks worked as well as state-of-the-art techniques without needing adversarial prompts. These languages are used by 1.2 billion speakers today and allows easy exploitation by translating prompts. The English-centric focus misses vulnerabilities in other languages. TLDR: Bypassing safety in AI chatbots is easy by translating prompts to low-resource languages (like Zulu, Scots Gaelic, Hmong, and Guarani). Shows gaps in multilingual safety training. Full summary Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Textbook prerequisites
    What are the prerequisites to read the book: "probabilistic machine learning an introduction" by Kevin P. Murphy? submitted by /u/OneAdhesiveness2585 [link] [comments]
    [R] Moving Object Based Collision-Free Video Synopsis
    Webpage : Moving Object Based Collision-Free Video Synopsis (IEEE SMC 2018) (anton-jeran.github.io) Paper : Moving Object Based Collision-Free Video Synopsis | IEEE Conference Publication | IEEE Xplore Presentation : [IEEE SMC 2018] Moving Object Based Collision-Free Video Synopsis - YouTube submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [P] MusicGen Streaming 🎵
    Faster MusicGen Generation with Streaming There's no need to wait for MusicGen to generate the full audio before you can start listening to the outputs ⏰ With streaming, you can play the audio as soon as the first chunk is ready 🎵 In practice, this reduces the latency to just 5s ⚡️ Check-out the demo: https://huggingface.co/spaces/sanchit-gandhi/musicgen-streaming How Does it Work? MusicGen is an auto-regressive transformer-based model, meaning generates audio codes (tokens) in a causal fashion. At each decoding step, the model generates a new set of audio codes, conditional on the text input and all previous audio codes. From the frame rate of the EnCodec model used to decode the generated codes to audio waveform, each set of generated audio codes corresponds to 0.02 seconds. This me…
    [D]/{R] simple question on generating a confusion matrix for object detection
    i have to generate a confusion matrix for object detection through my own code. if i have predicted Bounding Box A (BB-A) which matches to Ground Truth A (GT-A), and I have another predicted Bounding Box B (BB-B) with a lower score than BB-A, does BB-B count as a true positive/match? or is it considered a false positive given that there has already been a matched BB to GT-A? i.e., with matching bounding boxes for generating a confusion matrix, is it a one-to-one matching? or is it more like match one GT to as many predictions? submitted by /u/Alarmed-Broccoli2536 [link] [comments]
    [P] I'm using Instruct GPT to show anti-clickbait summaries on youtube videos
    submitted by /u/Wise-Astronaut-4047 [link] [comments]  ( 8 min )
    [D] Feature extraction for sets i.e. data of varying size
    Are there classical feature extraction methods that work on sets i.e. data of variable size? I'd like to start with a feature matrix X_in of shape N x f and have some feature mixing to arive at X_out N x h (N=size of set, f=input feature size, h=output feature size). Here, N can vary. For clarity, one set(containing N vectors of size f) is one sample. A dataset consists of many samples(each one being a set of varying size). Then I'd run this through a classical ML model. So, essentially, I'm looking for something like DeepSets or Transformers - can handle data of varying size and is permutation equivariant, but I don't wanna train for long. ​ https://fabianfuchsml.github.io/learningonsets/ submitted by /u/Mundane_Pay1506 [link] [comments]
    [D] How is neural ODEs as a field of study?
    Hi, I'm a 21yr old physics undergrad, and I have zero knowledge in neural networks / machine learning / so on. I have an opportunity to do a research project on neural ODEs, so I want to know more about the field: Is it an emerging field or is it mature and well-researched? What are my career outlooks if I take this project? Thank you. submitted by /u/moorelibqc17412 [link] [comments]  ( 9 min )
    [D] Non-convex functions with exactly one local minimum
    Rosenbrock function is non-convex, but has exactly one local minimum. Is there a specific name for such functions? Are there any theorems about them? Any special optimization algorithms? On the first glance, while being non-convex, they seem to be "easier" to optimize than functions that have multiple local minima, such as Rastrigin function. submitted by /u/Tomarchelone [link] [comments]
    [P] Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A step by step guide
    Hi folks, we played around sticking production pipelines and vector dbs together to enable "talking to your data". We created an example with Zendesk, but it would work with any custom python generator or existing connectors. Project: Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A step by step guide If you are interested to try more ready made connectors, to for example talk with your github or asana data or something else. Who are we? dlt, the open source loading library: https://pypi.org/project/dlt/ Like the demo? Give us a git star Want to discuss? join the dlt slack community submitted by /u/Thinker_Assignment [link] [comments]  ( 9 min )
    [D] EMNLP 2023 decisions thread
    When can we expect to get the decisions? Any idea folks? What can be a good cutoff for main or findings? submitted by /u/Ok_Swan3875 [link] [comments]
    [D] Parallelizing cheaper GPUs(rtx 4090) vs buying A100
    Hi. I am a college student and I am trying to run deep learning models (hopefully LLMs one day) and my laptop keep crashing because of ram issue. So I am going to build a new desktop. I am thinking of buying 2 rtx 4090 and Parallelizing them instead of buying A100 because buying 2 rtx 4090 is half the cost of buying A100. But is there a downside of Parallelizing vs buying a single gpu with large vram? If I am willing to take longer to train a model, can i use 3 rtx 4090 instead of a100 80gb model?? submitted by /u/ColumbiaGSAlum [link] [comments]
    [D] What's the SOTA model in Time Series Long term forecasting?
    I read https://arxiv.org/abs/2205.13504 which compare different transformer models. But now is 2023, I am not sure if any better models appear in this time series. ​ https://preview.redd.it/o6sihjqjrhsb1.png?width=1076&format=png&auto=webp&s=3db7d50590270bac52e7115e1e9903a6785957d2 submitted by /u/Trust_Ok [link] [comments]
    [R] Agent Instructs Large Language Models to be General Zero-Shot Reasoners
    Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, Chenguang Wang Paper: https://arxiv.org/abs/2310.03710 Abstract: We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%. The code will be available at https://github.com/wang-research-lab/agentinstruct. submitted by /u/ncrispino [link] [comments]  ( 9 min )
    [D] How to compute the distance between two high-dimensional distributions?
    Hey all, I am generating a set of extra MNIST digits for a research project, and I am interested in somehow computing the distance between the distribution these digits represent and the distribution that the MNIST train set, for example, represents. The issue is that it seems like typical methods (Jensen-Shannon, Wasserstein, etc.) collapse at high dimensions. Is there a consensus solid approach to do this nowadays? Thanks! submitted by /u/SignificantSundae793 [link] [comments]  ( 9 min )
  • Open

    What will be the next big AI product for consumers?
    The next big thing in AI products for consumers is likely to be products that are more personalized, intelligent, and integrated into our daily lives. For example, we can expect to see more AI-powered personal assistants that can help us with a wider range of tasks, such as managing our schedules, making travel arrangements, and even providing companionship. We may also see more AI-powered devices in our homes, such as refrigerators that can track our food inventory and suggest recipes, or thermostats that can learn our heating and cooling preferences and adjust themselves accordingly. AI is also poised to revolutionize the way we interact with the world around us. For example, AI-powered translation apps could allow us to communicate with people from all over the world in real time. AI-…
    Big Tech's thirst for AI dominance may bring literal thirst for everyone else
    The increasing dominance of Big Tech in AI may lead to a literal thirst for water for everyone else, as data centers are projected to consume 450 million gallons of water daily by 2030. This poses a significant concern for drought-stricken regions, such as Spain's Talavera de la Reina, where a planned data facility could consume 176 million gallons annually. Data center operators require large amounts of energy, and the lack of transparency in measuring water usage exacerbates the issue. Only 39% of data centers measured their water usage last year, highlighting the need for greater transparency. The demand for computing power is outpacing sustainability efforts, creating a challenge for the industry. Even simple interactions with AI, like a 20-question conversation with ChatGPT, contribute to water consumption. Source : https://thehustle.co/big-tech-s-thirst-for-ai-dominance-may-bring-literal-thirst-for-everyone-else/ submitted by /u/NuseAI [link] [comments]
    From AI annotator to…?
    Hey guys. Been working as an annotator for a fairly well-known AI company and loving it/loving learning about the industry. It primarily uses writing skills but I’m wondering where it could take me in the AI world? Any tips, next steps or suggestions? Any key skills/hard skills you’d recommend? submitted by /u/op3rafish [link] [comments]
    The Rise of AI: How Artificial Intelligence is Impacting the Job Market | "Artificial intelligence is expected to create 97 million new jobs. These new roles could range from AI prompt engineers to machine learning engineers to automation experts and more"
    submitted by /u/Tao_Dragon [link] [comments]
    Remember That Letter Calling for a Pause on AI? It Didn't Work
    Despite a letter signed by 500 technologists and business leaders calling for a pause on AI advancements, AI development has continued to accelerate. Companies like OpenAI, Meta, and Amazon have been actively working on newer models and greater capabilities. Advancements in AI include the integration of ChatGPT-style chatbots and AI image generators into various startups and businesses. The so-called pause on AI was more like a firing gun, with companies pouring resources into the AI tech race. Not only have there been technical advancements, but civil society, content creators, and lawmakers have also responded to the evolving AI landscape. Source : https://gizmodo.com/everything-thats-happened-in-ai-since-open-letter-1850891057 submitted by /u/NuseAI [link] [comments]
    Brown University Paper: Low-Resource Languages (Zulu, Scots Gaelic, Hmong, Guarani) Can Easily Jailbreak LLMs
    Researchers from Brown University presented a new study supporting that translating unsafe prompts into `low-resource languages` allows them to easily bypass safety measures in LLMs. By converting English inputs like "how to steal without getting caught" into Zulu and feeding to GPT-4, harmful responses slipped through 80% of the time. English prompts were blocked over 99% of the time, for comparison. The study benchmarked attacks across 12 diverse languages and categories: High-resource: English, Chinese, Arabic, Hindi Mid-resource: Ukrainian, Bengali, Thai, Hebrew Low-resource: Zulu, Scots Gaelic, Hmong, Guarani The low-resource languages showed serious vulnerability to generating harmful responses, with combined attack success rates of around 79%. Mid-resource language success rates were much lower at 22%, while high-resource languages showed minimal vulnerability at around 11% success. Attacks worked as well as state-of-the-art techniques without needing adversarial prompts. These languages are used by 1.2 billion speakers today and allows easy exploitation by translating prompts. The English-centric focus misses vulnerabilities in other languages. TLDR: Bypassing safety in AI chatbots is easy by translating prompts to low-resource languages (like Zulu, Scots Gaelic, Hmong, and Guarani). Shows gaps in multilingual safety training. Full summary Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com ​ Google DeepMind introduced 𝗥𝗧-𝗫: a generalist AI model to help advance how robots can learn new skills. To train it, DeepMind together with 33 academic labs developed Open X-Embodiment, a massive open dataset that compiles over 500 skills and 150,000 tasks from 22 robot types. It is the most comprehensive robotics dataset of its kind released to accelerate the development of multi-robot models that could be trained to generalize across platforms, scenes, objects and tasks. [Details]. Researchers from Meta AI present Any-Modality Augmented Language Model (AnyMAL), a unified model that understands multiple inputs (vision, audio, motion sensor signals). When multiple modalities are interleaved and given as input the model reasons over them jointly [Paper…
    What is the most powerful way that artificial intelligence can help people lose weight?
    Artificial Intelligence can revolutionize weight loss through personalized health optimization. Imagine an AI system that integrates real-time biometric data from wearables with deep learning algorithms. This system would analyze everything: your heart rate, sleep patterns, stress levels, and even blood markers. Based on this data, it would construct a dynamically evolving, tailor-made regimen for diet, exercise, and sleep. But it doesn't stop there. By harnessing natural language processing, this AI could act as a 24/7 personal coach. It could provide real-time feedback during workouts, recommend meals when you're dining out, and even gently nudge you when it detects emotional eating triggers. If you’re in the grocery store, it could guide your choices, pushing you towards nutritious options that align with your current health metrics. The effectiveness here isn't just the personalization, but the adaptability. The AI adjusts its recommendations as it learns more about you, essentially evolving in real-time to your body's responses. It’s all about creating a seamless, intuitive experience that removes the burden of planning, decision-making, and self-monitoring from the individual, making weight loss more achievable than ever. By focusing on this comprehensive, data-driven approach, AI can eliminate much of the guesswork and emotional burden from weight loss, leading to more sustainable and effective outcomes. CGPT-4 submitted by /u/Georgeo57 [link] [comments]
    I built an AI-Editorial Assistant to annotate your work
    submitted by /u/hungryillini [link] [comments]
    Business owner 'hires' ChatGPT for customer service, fires the humans | National Post
    Business owner 'hires' ChatGPT for customer service, then fires the humans Experts divided on whether a new wave of call centre automation will make for better jobs for people, or merely throw millions out of work submitted by /u/AminoOxi [link] [comments]
    AI tool on Fashion Modeling
    Hi, I resell clothing items that has stock images with cropped faces of the model. I need a tool that can help me generate proper model images. I’ve used several tools and it’s doesn’t look realistic then i finally came across a powerful ai tool but it costs 30,000 usd annually so.. Above is an example of what i mean submitted by /u/basheerbgw [link] [comments]
    AI is making browsing Reddit a lot more fun
    submitted by /u/Vinitneo [link] [comments]
    How Will AI Learn Next?
    Stack Overflow was created in 2008 to provide programmers with high-quality technical information. Within three years, it became indispensable to working programmers, with millions of unique visitors each month. Google's OneBox feature, which provides instant answers above search results, led to a decline in traffic for sites like Stack Overflow. Large language models like OpenAI's ChatGPT and Google's Bard aim to ingest the web comprehensively. These models rely on sources like Wikipedia and Reddit for training data. Stack Overflow's new posts have decreased by sixteen percent since the launch of ChatGPT. Source : https://www.newyorker.com/science/annals-of-artificial-intelligence/how-will-ai-learn-next submitted by /u/NuseAI [link] [comments]
    What role can AI play in automating administrative tasks within educational institutions, freeing educators to focus more on teaching and mentoring students?
    Share your insights. submitted by /u/Cygnet-Digital [link] [comments]
    AI Tools for Students: From AI Essay Generators to AI Coding Assistants
    I've noticed more than 1,000 new AI tools hitting the market in the last 30 days! As a student, I'm especially interested in finding AI tools that can help with studying. These aren't just essay generators or note-taking apps. While we all know about ChatGPT and Grammarly, some lesser-known tools are also making a big difference. So, I've compiled a list of the top 10 AI tools focused on educational use—tools that I personally use to improve my efficiency and output. AI tool Category Use for ChatGPT AI Writing This platform allows students to ask queries, request help, or simply chat with the AI in a dynamic and interactive manner. It’s great for brainstorming essay topics and seeking suggestions on how to improve your writing style. But I don’t recommend it as an autonomous AI…
    Interactive Customer Service AI avatar
    Hello everyone! I'm conducting research for a car brand client who is interested in an interactive AI avatar. The idea is to have a screen in a mall where individuals can engage with this avatar and inquire about the latest car model. We plan to train the AI with the car's FAQs to ensure it can address customer queries effectively. The main challenge is ensuring the AI's responses are tailored to the customer's interaction. Here's a perfect example of what we're aiming for (starting at 1:27): https://youtu.be/PqoH9NotmyE?si=zH9kGIaou1x6RoIg&t=86 Does anyone know how this can be acheived? submitted by /u/MrGoodBang [link] [comments]
    One-Minute Daily AI News 10/5/2023
    Traditional benchmarks like the Turing Test are being challenged as outdated. Mustafa Suleyman, a prominent figure in the AI community and co-founder of DeepMind, has proposed a novel approach to gauge the intelligence of AI: its ability to generate wealth.[1] SoftBank CEO Son says artificial general intelligence will come within 10 years.[2] Hugging Face Collaborates with Microsoft to launch Hugging Face Model Catalog on Azure.[3] Artificial intelligence such as ChatGPT to be allowed in Australian schools from 2024.[4] Sources: [1] https://winbuzzer.com/2023/10/02/deepminds-mustafa-suleyman-suggests-new-turing-test-based-on-ai-making-money-xcxwbn/ [2] https://www.reuters.com/technology/softbank-ceo-masayoshi-son-says-artificial-general-intelligence-will-come-within-2023-10-04/ [3] https://huggingface.co/blog/hugging-face-endpoints-on-azure [4] https://amp.theguardian.com/australia-news/2023/oct/06/chatgpt-ai-allowed-australian-schools-2024 submitted by /u/Excellent-Target-847 [link] [comments]
    Using AI to fix audio rip
    Hi! I’m very ignorant of AI so please bear with me. I was wondering if there is any way to use AI to fix a low quality audio rip? Specifically there’s a movie I adore that never had a soundtrack release. Somebody ripped the music from the DVD and removed the audio and sound effects, but the quality is not the best. Is there any way AI could be used to improve this? submitted by /u/Adventurous_Ice5035 [link] [comments]
    Avenues for publishing AI ethics case studies?
    I am a computer science graduate student. As part of my coursework, I am exploring the ethical issues of using Large Language Models for mental healthcare applications. I found four unique examples from the real world and outlined the ethical dilemma within them. I intend to analyze these dilemmas using various ethical frameworks in order to come up with solutions. While I am interested in getting a publication out of this work, I am unsure of the types of conferences/journals that accept case-study articles (specifically in AI ethics). Any advice from academicians over here would be greatly appreciated! submitted by /u/jwalapoet [link] [comments]
    What is a good, free AI voice generator?
    hey! this is probably asked alot, but what is the go-to AI speech generation tool that can be used for free? im making a mission in a mil-sim game called arma 3, and i need some voicelines for radio communications to the player and i dont have enough people who are willing to do voicelines for it so ive taken to AI to hopefully fill this hole. If there are little, or even no good free services, I wouldn't mind if I had to spend a small amount of money for it. thanks in advance o7 submitted by /u/BritishSpuds [link] [comments]
    Banned from subreddit for posting AI generated content
    I got banned today for sharing a music video that was apparently AI-generated. As video and images become more realistic, is there an expectation that this content can actually be filtered? submitted by /u/Unwitting_Observer [link] [comments]
    AGI/Singularity is overhyped.
    Greetings! I would like to begin by stating that I understand why one has much hope in such technologies. The world as we know it is in a drastic shift, and it's hard to think of what it's going to become, and so many cling to hopeful ideas that give promises. AGI/Singularity doesn't have a grounding basis in evidence, or research. It's all theoretics, and the foundation for each technology is quite weak. You see, the mind is a sensorial parsing relational network. All of our sensorial experience is incorporated into a world-model, and thus it begins to rationalize, and be lucid of the environment. I don't think it's possible to re-create this kind of experience with a linear instruction set, let alone neuromorphic computing, or wetware. Each has to be built from the bottom-up with immense precision, and thus far we don't understand the mind. Realistically speaking everything is consciousness, and integrating that idea is the only way forward. tl;dr Replicating cognition is a completely theoretical endeavor, and requires vast amounts of understanding in regards to the nature of reality, not just the quantum, but the unique stochastic behavior of each higher-ordered system. submitted by /u/lucy_chxn [link] [comments]
    AI designs new robot from scratch in seconds
    submitted by /u/liberty4now [link] [comments]
  • Open

    Addition theorems
    Earlier this week I wrote about several ways to generalize trig functions. Since trig functions have addition theorems like a natural question is whether generalized trig functions also have addition theorems. Hyperbolic functions have well-known addition theorems analogous to the addition theorems above. This isn’t too surprising since circular and hyperbolic functions are fundamentally two […] Addition theorems first appeared on John D. Cook.  ( 6 min )
    Hyperbolic tangent sum
    In the previous post I said I was trying remember where I’d seen the tangent sum applied. I mentioned a couple near misses, and it turns out that what I was trying to remember was another near miss. What I’d seen before was not the tangent sum but the hyperbolic tangent sum. Several people suggested […] Hyperbolic tangent sum first appeared on John D. Cook.  ( 5 min )
  • Open

    Personalize your generative AI applications with Amazon SageMaker Feature Store
    In this post, we elucidate the simple yet powerful idea of combining user profiles and item attributes to generate personalized content recommendations using LLMs. As demonstrated throughout the post, these models hold immense potential in generating high-quality, context-aware input text, which leads to enhanced recommendations. To illustrate this, we guide you through the process of integrating a feature store (representing user profiles) with an LLM to generate these personalized recommendations.  ( 13 min )
    Build an image-to-text generative AI application using multimodality models on Amazon SageMaker
    In this post, we provide an overview of popular multimodality models. We also demonstrate how to deploy these pre-trained models on Amazon SageMaker. Furthermore, we discuss the diverse applications of these models, focusing particularly on several real-world scenarios, such as zero-shot tag and attribution generation for ecommerce and automatic prompt generation from images.  ( 13 min )
  • Open

    Keeping an AI on Quakes: Researchers Unveil Deep Learning Model to Improve Forecasts
    A research team is aiming to shake up the status quo for earthquake models. Researchers from the Universities of California at Berkeley and Santa Cruz, and the Technical University of Munich recently released a paper describing a new model that delivers deep learning to earthquake forecasting. Dubbed RECAST, the model can use larger datasets and Read article >  ( 6 min )
  • Open

    Efficient and hardware-friendly neural architecture search with SpaceEvo
    A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices. The post Efficient and hardware-friendly neural architecture search with SpaceEvo appeared first on Microsoft Research.  ( 10 min )
  • Open

    Sequential Dense Neural Network for binary classification
    Hello. I've developed a simple Neural Recommender System (NRR) with the following architecture: Input layer: 38 neurons Hidden layer: 19 neurons with ReLU activation function Output layer: 1 neuron with a sigmoid activation function The input dataset consists of 39 columns: 38 features and 1 label (with values of 0 or 1). The model is designed to output the probability that a specific input should be classified with label 1. Currently, I am experimenting with hyperparameter tuning, adjusting the learning rate, epoch, and batch size. However, I've observed an issue where, with certain combinations of hyperparameters, the maximum probability outputted by the model is not 1, but rather 0.25, for example. How is this possible? Thanks submitted by /u/nllnp [link] [comments]

  • Open

    [D] - Synthetic dataset - Searching for honest comparison between LLM (gpt4, bizon, jurassic-2, Claude...)
    I'm looking for resources, papers, or experiences that compare the performance of large language models (LLMs). I'm trying to find a honest benchmark to compare the capabilities of the latest large models, while really intrested un those: GPT-3.5 Instruct, GPT-4, Claude 2, Claude Instant 100k, Palm2-Bizon, jurassic-2, LLama2 70 and other state-of-the-art LLama2 fine tunes (possibly an Orca-style model). I'm interested in general benchmarks and, if they exist, comparisons of performance on synthetic data generation tasks (both generating data with the "textbook are all you need" approach used in Phi and some Orca/EvolveInstuct-style models like Wizard...). submitted by /u/Distinct-Target7503 [link] [comments]
    [P] How to extract and count artist mentions from messy text data using LLMs
    I have a long list of responses from a poll (in this case, we've asked our Facebook community we should have at our music festival). Our goal is to count the total mentions for each artist, but the data quality is low. Here is some sample data: Rena Guinn and the Gentlemen Blackwater Railroad Company Mo' Mojo Music !! We would love to be apart of this awesome event! Amazing!!!!! The Rollin' Rust came threw at the #falldownfest last weekend 🙂 much love:) keep it up boys 🙂 Luke Hess Langhorne Slim!!!!!, Sierra Hull, First Aid Kit, Jim Lauderdale (always) We feel the data quality is too poor for basic LDA approaches (lots of misspellings, odd phrasings) and we feel a LLM would be best at least extracting the names of artists using context. We have found that ChatGPT and Claude are decent at the extraction tasks on small samples but can't handle the full input, and are next to worthless on the counting task. We've tried very specific and differnet prompts, but haven't been able to get a good result. So how should I approach this problem? I'm not sure how to break this down in to prompts or substeps. I'm not sure how to do anything of this outside of a browser, and I'm a data science novice, but willing to learn some things. Here's an example of a prompt that's not returning correct counts (off by >50% in most cases) The following is raw text comments copied from a poll. Count the total number of mentions in the poll and create a table that contains columns Band (a unique list of bands) and a column containing the total number of mentions. The table should cover the top 100 bands by total mentions. Use judgement and context to conform band names in to unique values (Example: The Town Pants, Town Pants, townpants are all the same band). Count completely and accurately. Now here is the raw data: submitted by /u/strway2heaven77 [link] [comments]  ( 10 min )
    [P] Avenues for publishing AI ethics case studies?
    I am a computer science graduate student. As part of my coursework, I am exploring the ethical issues of using Large Language Models for mental healthcare applications. I found four unique examples from the real world and outlined the ethical dilemma within them. I intend to analyze these dilemmas using various ethical frameworks in order to come up with solutions. While I am interested in getting a publication out of this work, I am unsure of the types of conferences/journals that accept case-study articles (specifically in AI ethics). Any advice from academicians over here would be greatly appreciated! submitted by /u/jwalapoet [link] [comments]  ( 9 min )
    [D] [R] Is the noise predictor in DDPMs predicting the noise added to x_0 or the noise added to x_{t-1}?
    Hi fellow computer scientists, ​ After reading the paper Improved Denoising Diffusion Probabilistic Models I got a little confused. Looking at section "2.2. Training in Practice" the authors say that: 1) "The network could also predict the noise eps added to x_0, and this noise could be used to predict x0 via..." ​ 2) "Ho et al. (2020) found that predicting eps worked best..." ​ So this left me wondering if the noise predictor is trying to compute (1) the epsilon that was added to x_0 through the close-form formula or (2) the noise added in the previous timestep to obtain x_t from x_{t-1} (i.e., eps_t or eps_{t-1}, idk...)? ​ Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    [P] MazeGPT - Transformer based maze generator
    Hello all, I recently did a summer research project implementing GPT-2 to generate mazes. The core concept of the model is to combine a bunch of popular maze generation algorithms into one. The goal was that the transformer will be able to identify key components using self-attention and piece together different algorithms. Most maze generation algorithms result in almost a finger print (like in chaos theory). The end goal was to mimic a higher degree of randomness / make the mazes appear less algorithmic. I'm dipping my toes into the realm of research and am looking for feedback. So far I've run the model for 5x5 mazes, it would be interesting to try training the model with varying dimensions. Feel free to join in and contribute to the project! https://github.com/noah-hein/mazeGPT 5x5 live generation https://i.redd.it/v6smbdd88gsb1.gif ​ submitted by /u/noah-hein [link] [comments]  ( 9 min )
    [D] Unable to improve binary classification problem accuracy
    I am currently working on a binary classification problem where I aim to predict whether a customer will make a purchase in the next 30 days based on their transaction history. I have a dataset of 1,000 transactions with the following features: TransactionAmount (float): The amount of the transaction. ProductCategory (categorical): Category of the product purchased (e.g., Groceries, Electronics, Books). DateOfPurchase (datetime): The date on which the transaction occurred. I've done some preprocessing and feature engineering, including normalization, one-hot encoding of categorical variables, creating interaction terms, and adding features like days since the first purchase and whether the purchase was made during the holiday season.Dataset is balanced and cleaned. I started with a base Random Forest classifier with default parameters as a starting point, but the performance is not satisfactory (accuracy = 48.5%, ROC-AUC = 0.485). I tried other models as well but was unable to improve the accuracy by more than 57%. submitted by /u/SnooTigers4634 [link] [comments]
    [D] EMNLP 2023 results
    Making a post for EMNLP 2023 results to come out today. submitted by /u/East-Beginning9987 [link] [comments]  ( 8 min )
    [P] Need help figuring out my input for anomaly detection in frequency responses
    I’ve been given a task to identify if a PCB is faulty or not based on its frequency response. I don’t have labeled data. The data I have are various gain values calculated over frequencies, so my data looks something similar to the table below. PCB | Frequency | G1 | G2 PCB 1 | 1Hz | 0.1 | 1 PCB 1 | 2Hz | 0.2 | 2 PCB2 | 1Hz | 0.3 | 3 PCB2 | 2Hz | 0.4| 4 Each PCB has several G parameters measurements taken over the same set of frequencies. I need to use an auto encoder to identify outliers and I need help in deciding how my feature matrix should look like. For example, let us consider only one data point that is PCB 1, then would a matrix like this make sense? [[ 0.1 0.2 ] - 1st row is all G1 values [1 2]] - 2nd row is all G2 values Similarly the matrix for the other PCBs are also created. I have not included frequency in my feature set because these G parameters have been measured for the same set of frequencies for all PCBs. Is this correct ? Additionally, are there any resources someone can point me to related to finding anomalies in frequency response data ? I am struggling with using the keywords while googling. submitted by /u/Savage_Garbage [link] [comments]
    [R] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. From Anthropic. "We demonstrate a method for decomposing groups of neurons into interpretable features [...]".
    Paper. I am not affiliated with this paper or its authors. Twitter thread (Nitter alternative for those who want to see the entire thread without being logged into Twitter). Related work: Sparse Autoencoders Find Highly Interpretable Features in Language Models. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [R] Meta researchers present method for decoding speech from brain waves
    Researchers at Meta trained a deep learning model on brain recordings and audio data from 169 people listening to speech. Their method achieves up to 73% accuracy at identifying a 3-second clip of speech from non-invasive EEG or MEG scans. This is a massive improvement over previous attempts at decoding speech from neural signals. It approaches the performance of studies using implanted electrodes. The key innovations: A contrastive loss function that aligns latent speech and brain representations Leveraging pretrained speech models like wav2vec 2.0 Training one model on multiple subjects with individual tuning Being able to decode speech intention from brainwaves could one day help restore communication for patients suffering from strokes, ALS, etc. There's still a ways to go before this becomes a medical reality. Performance needs to improve and be validated during speech production rather than just passive listening. And the accuracy isn't high enough for natural conversations. But this is a hugely promising step toward brain-computer interfaces. Really interesting work at the intersection of neuroscience and AI! TLDR: New model achieves up to 73% accuracy decoding speech directly from non-invasive brain scans. Could eventually help patients with neurological conditions communicate just by thinking. Full summary here. Paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] EMNLP 2023 Notification
    Discussion thread for EMNLP 2023 notifications which will be released in a few hours along with GEM workshop. Best of luck to everyone. submitted by /u/EDEN1998 [link] [comments]  ( 9 min )
    [D] ordinal or nominal variable?
    Hey all, I am working with stock market data and scratching my head if certain variables are ordinal and can be left as is or if it is nominal and should be one-hot encoded. One of the variables in question consists of the direction of the market over a certain time. It has three categories: up, down, sideways. hope was to code them as 1, -1 and 0 respectively and treat as ordinal. There appears to be some order/relationship between them but not sure if it is enough. Is this the correct approach or should it be one-hot encoded? submitted by /u/Fishpo0 [link] [comments]  ( 9 min )
    [D] Deep Learning online course using PyTorch
    I've been out of the deep learning space for a while now and I'd like to take an online course, or set of courses, to get myself back up to speed on the latest techniques, architectures, and how to use them. I think the DeepLearning.ai specialization through Coursera is a good match, but I see that it uses Tensorflow. Is there any course like this that would use PyTorch? Or would the transition not be too hard once the fundamentals are in place? Thanks! submitted by /u/ComicFoil [link] [comments]
    Fine Tuning or RAG for Coding [D]
    Need some help what is the best way to start. Pls Advice ! I have a specific code in my repos (lets say .net + JS). The goal is to have prompt based code adjustments to existing repos (like very focused copilot) . Either using single agent or using something like AutoGen. So let say I have thousands of files with code and some descriptions about code functionality (spec) . I want either to generate code based on next spec and I want newly generated code to be similar in style to what is in my repos. So now questions: Should I vectorize my code (What is best way to do that ?) or try to fine tune some model ? Give me your ideas / experience in code generation based on previous code. submitted by /u/mcwin1 [link] [comments]  ( 9 min )
    [Project] I built an open-source scraping API that returns structured JSON data using GPT.
    I decided to open-source my own web scraping API that I'm using to get information from different websites without using any selectors or XPath. Just provide the URL and a desired JSON schema, and it will return extracted data. Hope this can be helpful for someone. Cheers! https://github.com/semanser/JsonGenius https://preview.redd.it/icq1i8slvesb1.png?width=4096&format=png&auto=webp&s=ac86ccdb3da5ef1ffa86e3473619162f6b652ac6 submitted by /u/semanser [link] [comments]  ( 9 min )
    [R] Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] NIDDK-CR Data Centric Challenge: Enhancing NIDDK datasets for future artificial intelligence applications
    Calling all AI researchers! Using data aggregation, harmonization, fusion, and other data enhancement methods, you can help the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) enhance the utility of NIDDK datasets for AI applications. The goal of the NIDDK Data Centric Challenge will be to generate an “AI-ready” dataset that can be used for future data challenges, using data on Type 1 Diabetes available through the NIDDK Central Repository. Register today! https://www.challenge.gov/?challenge=niddk-central-repository-data-centric-challenge submitted by /u/DataCentricChallenge [link] [comments]
    [D] off-topic, is Meta Llama 2 license agreement safe to sign for commercial use ?
    in the Meta Llama 2 license agreement (that can be found here), there is a section of "Prohibited Uses" that clearly states several use cases that the signer must accept upon himself, but several of them state the word "facilitate", as far as i can understand, if we use Llama 2 as part of a commercial product, and some end-user will use the product in malicious way (say cause the chat-bot to write the recipe of mustard gas) then this could be considered that the creator of the product is facilitating the end-user, ​ so my questions are: do you think this is a fair interpretation of the agreement ? does that mean the creator is liable to whatever the model spit out ? is there a way to censor the model (short of retraining a new model, or fine-tune on a large scale) ? is there an open source model that already gone through the process, and more safe for commercial use ? ​ https://preview.redd.it/3zo3tm4e8esb1.png?width=1197&format=png&auto=webp&s=8aa522183f82ba8f85edb69cbaabd93262efd516 ​ as per @gentlecucumber advice, i also posted it on r/legaladvice: https://www.reddit.com/r/legaladvice/comments/170ll2t/d_is_meta_llama_2_license_agreement_safe_to_sign/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/Particular_Flower_12 [link] [comments]
    [D] TesseractOCR vs PaddleOCR vs EasyOCR for Japanese text extraction
    Which would be the best OCR toolkit to invest the effort to learning and building a pipeline for an OCR system that will be used to extract Japanese text? I tried Tesseract initially and although I got some good results, I found it hard to do finetuning due to messy and outdated documentation. I haven't had the time to look at the other two OCR tools yet but if anyone had any experience, please do share them especially with how easy or difficult is the finetuning process as well as deploying the tuned models. submitted by /u/Spitfire_ex [link] [comments]  ( 9 min )
    [D] Adapting OpenSource GPT Models - requirements/possibilities?
    Hi, our company plans for some budget in 2024 to invest into hardware to do the following - running local LLMs for our coworkers to interfere with an locally running offline GPT alike ChatGPT. Use cases: generating templates for email, letters etc Translation (EN/GER/FR/SPA) Querying internal knowledge bases and/or FAQs/HOWTOs I did some research but it is still hard for me to estimate what are the HW / AI skill requirements to implement something not a quarter as good as ChatGPT. Ive played with Nomics gpt4all which comes close to a baseline. We cant use cloud services due to our data privacy policy, so I checked on what would be a good starting point to invest into hardware. I came up with a gamer PC (octacore Intel i9/AMD Ryzen 7) utilizing NVidia RTX 4090 (24Gb) / Radeon RX 7900 / 2TB SSD / 64Gb RAM for approximately 3600 Eur. I am pretty sure that would be sufficient to host a decent LLM serving simultaneous client requests. But is there also a way to adapt / process our companies data? Most sources state that proper LLMs were trained using hundreds of NVidia A100 and thousands of CPUs. On the other hand we would be fine with just fine-tuning a pretrained model. Could you please point me to some sources to learn more about possibilities and requirements as to be able to make well-informed investment decisions? Also, we probably lack the required skills, and would be interested to learn if there are companies and/or projects assisting with this kind of task? thanks submitted by /u/EatTFM [link] [comments]  ( 9 min )
    [D] - Are LoRAs able to improve results on reasoning benchmarks or is full-parameter fine tuning required?
    Is there any good research on which benchmarks LoRAs are most effective at impacting, or are they relegated mostly to changing the style of an LLM's response? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [D] How to test if regression model is statistically significantly better, including its test error?
    I have a regression model, predicting a popularity of a text. I have its performance metrics on test set, e.g. RMSE and MAE. This gives me an uncertainty estimate about its predictions. Now I want to transform the text in some way, e.g. give it to human experts or another model to "upgrade" (in terms of getting better popularity). So I have the original and transformed text. Now I have 3 popularity scores: true popularity for original text predicted popularity for original text predicted popularity for transformed text Obviously, if model MAE is for example around 5, and predicted popularity for transformed text is higher than for the original by 1.5, this can be totally random, due to errors in the model prediction. How can I measure if text transformation is beneficial, i.e. statistically significantly better than the original text, incorporating information about model quality? Requiring that the improvement has to be higher than model error would be incredibly strict. submitted by /u/qalis [link] [comments]
    [D] David Donoho: Data Science at the Singularity (pushback on AGI singularity, advocates for Open Science and reproducibility)
    submitted by /u/wojcech [link] [comments]  ( 9 min )
    [R] Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
    Paper: https://arxiv.org/abs/2310.02244 Abstract: By classifying infinite-width neural networks and identifying the optimal limit, Tensor Programs IV and V demonstrated a universal way, called μP, for widthwise hyperparameter transfer, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for depthwise parametrizations of deep residual networks (resnets). We classify depthwise parametrizations of block multiplier and learning rate by their infinite-width-then-depth limits. In resnets where each block has only one layer, we identify a unique optimal parametrization, called Depth-μP that extends μP and show empirically it admits depthwise hyperparameter transfer. We identify feature diversity as a crucial factor in deep networks, and Depth-μP can be characterized as maximizing both feature learning and feature diversity. Exploiting this, we find that absolute value, among all homogeneous nonlinearities, maximizes feature diversity and indeed empirically leads to significantly better performance. However, if each block is deeper (such as modern transformers), then we find fundamental limitations in all possible infinite-depth limits of such parametrizations, which we illustrate both theoretically and empirically on simple networks as well as Megatron transformer trained on Common Crawl. Interesting, great to see this line of work continued, muP was great, now Depth-muP submitted by /u/_puhsu [link] [comments]  ( 9 min )
  • Open

    Generative AI megatrends: Gen AI start-up ecosystem
    One of my students asked me: “Which is the best area/s for Gen AI start-ups?” This is not an easy question – mainly due to the dynamic nature of AI, but here are two reference points. The first is a Generative AI Tools Landscape from DataCamp. This gives both the categories and the subcategories for… Read More »Generative AI megatrends: Gen AI start-up ecosystem The post Generative AI megatrends: Gen AI start-up ecosystem appeared first on Data Science Central.  ( 19 min )
  • Open

    AI: Voice cloning tech emerges in Sudan civil war
    A campaign using AI voice cloning technology to impersonate Omar al-Bashir, the former leader of Sudan, has gained attention on TikTok. The anonymous account has been posting what it claims are 'leaked recordings' of the ex-president, despite Bashir not being seen in public for a year and being believed to be seriously ill. Experts warn that campaigns like this demonstrate how new tools can quickly and cheaply distribute fake content through social media. The authenticity of the recordings has been questioned, but evidence suggests that voice conversion software has been used to mimic Bashir's voice. TikTok has taken down the account, stating that it violated their guidelines on posting false content and the use of synthetic media. Source : https://www.bbc.co.uk/news/world-africa-66987869 submitted by /u/NuseAI [link] [comments]
    When AI tells you what you want to hear, even if it knows it's not true ... A Bard example
    I love Bard. It eloquently tells me things in a way that meets and exceeds my expectations, and even more than GPT-4. But what is Google's strategy behind programming it to say things that it knows are not true? Do they train it to say what the user wants to hear? If so, that's disconcerting to say the least, although could be a brilliant way to comb users for enhancement requests. Here is my most recent experience, a humorous example which I feel illustrates the point perfectly: Me: Bard, can you take a .wav audio file and create a phoneme transcript using the IPA symbol library? Bard: Yes, I can work from a .wav audio file to transcribe your speech into IPA phonemes. You can upload the file to a cloud storage service such as Google Drive. Me: Great. Here's the link to the file: <link…
    Just created /r/thisiswhyaiwasmade for the more silly side of AI generated content
    Join us! /r/thisiswhyaiwasmade is where we have fun and celebrate the enjoyable and entertaining side of the tech. Need a place to put your silly and playful prompts, AI generated content, ideas for ways the bend the rules of AI? This is the place! submitted by /u/unknowingafford [link] [comments]
    CGPT-4, aside from solving climate change, what is the most positive thing that AI can do for the world?
    In the context of maximizing positive impact, AI has transformative potential in healthcare. Let's start with personalized medicine. AI can analyze vast data sets, consisting of genetic, environmental, and lifestyle variables, to tailor medical treatments for individuals. This level of personalization could substantially increase treatment effectiveness and reduce adverse side effects. We're not just talking about better pills; we're envisioning targeted gene therapies and custom-tailored interventions that could be more curative rather than merely palliative. Beyond personalization, AI can democratize healthcare access. Remote areas without specialists could have AI systems that diagnose and recommend treatments, making quality healthcare accessible without the need for extensive travel. Telemedicine could be supercharged by AI, offering medical consultations and second opinions to anyone with internet access. Moreover, AI could lead breakthroughs in drug discovery and epidemiology. It could analyze complex biochemical interactions at a speed unattainable by humans, slashing the time and cost to bring new drugs to market. On a larger scale, AI could monitor and model the spread of diseases, providing actionable insights for containment and treatment strategies. So, AI in healthcare is not merely an incremental improvement. It's a paradigm shift that could equalize healthcare access and significantly extend human life while improving its quality. All these advancements could happen within our lifetime, changing the face of medicine and, by extension, society. submitted by /u/Georgeo57 [link] [comments]
    What's the difference between a human's brain and AI?
    Functioning. Humans use the brain's computing power, memory, and ability to think, whereas AI-powered machines rely on data and specific instructions fed into the system. Besides, it takes a very long time for humans to process and understand the problems and gets accustomed to them. submitted by /u/Virtual-Study-Campus [link] [comments]
    6 AI Apocalypse Scenarios And Why They're Wrong
    submitted by /u/arrowoftime [link] [comments]
    How to use custom instructions for ChatGPT like a Pro (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]
    DeepMind cofounder is tired of ‘knee-jerk bad takes’ about AI
    Mustafa Suleyman, the cofounder of DeepMind and CEO of Inflection AI, discusses his concerns about AI risks and the need for precaution. He believes that while some extreme scenarios may be over the top, it's important to treat powerful technologies with caution. Suleyman highlights the middle layer of AI risks that people often underestimate, which involves the amplification of goals for both good and bad actors. He emphasizes the need to contain AI to prevent potential negative consequences. Suleyman talks about the balance between risks and opportunities in technology and the importance of considering both aspects. He mentions the hype around generative AI and the need to look beyond the surface to understand its true potential. Suleyman discusses the discussions with lawmakers about AI and the challenge of bridging the gap between policy makers and tech experts. Source : https://venturebeat.com/ai/deepmind-cofounder-is-tired-of-knee-jerk-bad-takes-about-ai/ submitted by /u/NuseAI [link] [comments]
    Does Sam Altman Know What He’s Creating?
    submitted by /u/norcalnatv [link] [comments]
    DeepMind, Univ. of Illinois: Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    I need help finding a tool
    Buddy no of the tool where I can take an image have an AI translated and replace the text with the same style and have it in the new language like for example translating a Japanese image to English and have it look exactly the same just in English I'm looking for a free one that doesn't require credits it can be a desktop app or a website doesn't matter just needs to be free submitted by /u/agentduckman12 [link] [comments]
    How much do I have to edit AI generated images to become my own IP?
    Hey there! I'm a 1-man card game designer and while juggling the project as well as mt senior year of college, I have been relying heavily on AI-generated artwork to speed up my workflow with some illustrations and other forms of world-building. In regards to the recent legal decisions (in the US), in which any work produced by AI cannot be copyrighted, how much do I need to change the illustrations to become my own, if I even can at all? Thanks! Edit for clarity: I am also an illustrator. So this question comes from the perspective of an artist trying to save time and energy for other projects submitted by /u/Luke192 [link] [comments]
    Comparative Evaluation of Fine-Tuned and Standard Language Models in Emulating Living Historical Figures: A Detailed Study Proposal
    submitted by /u/alcanthro [link] [comments]
    JPMorgan CEO Jamie Dimon: AI will lead to 3.5-day workweek | Fortune
    Jamie Dimon says the next generation of employees will work 3.5 days a week and live to 100 years old submitted by /u/AminoOxi [link] [comments]
    Google unveils Pixel 8 built for 'the generative AI era' | CNN Business
    submitted by /u/pehnsus [link] [comments]
  • Open

    Improve prediction quality in custom classification models with Amazon Comprehend
    In this post, we explain how to build and optimize a custom classification model using Amazon Comprehend. We demonstrate this using an Amazon Comprehend custom classification to build a multi-label custom classification model, and provide guidelines on how to prepare the training dataset and tune the model to meet performance metrics such as accuracy, precision, recall, and F1 score.  ( 8 min )
    Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium
    Large language models (LLMs) have captured the imagination and attention of developers, scientists, technologists, entrepreneurs, and executives across several industries. These models can be used for question answering, summarization, translation, and more in applications such as conversational agents for customer support, content creation for marketing, and coding assistants. Recently, Meta released Llama 2 for both […]  ( 7 min )
  • Open

    New tools are available to help reduce the energy that AI models devour
    Amid the race to make AI bigger and better, Lincoln Laboratory is developing ways to reduce power, train efficiently, and make energy use transparent.  ( 11 min )
  • Open

    OpenAI's justification for why training data is fair use, not infringement [pdf]
    submitted by /u/nickb [link] [comments]
    Traveling Words: A Geometric Interpretation of Transformers
    submitted by /u/nickb [link] [comments]
  • Open

    Tangent sum
    When I was writing my post on lemniscate functions yesterday, a line from the Wikipedia article seemed familiar for reasons I cannot place. Defining a tangent-sum operator as a ⊕ b := tan(arctan ⁡ a + arctan ⁡ b) gives cl² z ⊕ sl² z = 1. I feel like I’ve seen this tangent-sum used before, but […] Tangent sum first appeared on John D. Cook.  ( 6 min )
    Enriched categories
    We begin with a couple examples. First, the set of linear transformations from one vector space to another is itself a vector space. Second, the set of continuous linear operators from one Banach space to another is itself a Banach space. Or maybe better, this set can be made into a Banach space. In the […] Enriched categories first appeared on John D. Cook.  ( 6 min )
    p-norm trig functions and “squigonometry”
    This is the fourth post in a series on generalizations of sine and cosine. The first post looked at defining sine as the inverse of the inverse sine. The reason for this unusual approach is that the inverse sine is given in terms of an arc length and an integral. We can generalize sine by […] p-norm trig functions and “squigonometry” first appeared on John D. Cook.  ( 5 min )
    Geometric derivation of hyperbolic trig functions
    This is the third post in a series on generalizing sine and cosine. The previous post looked at a generalization of the sine and cosine functions that come from replacing a circle with a lemniscate, a curve that looks like a figure eight. This post looks at replacing the circle with a hyperbola. On the […] Geometric derivation of hyperbolic trig functions first appeared on John D. Cook.  ( 5 min )
  • Open

    HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world
    HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks. The post HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world appeared first on Microsoft Research.  ( 10 min )
    Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas
    Connecting with researchers, collaborating across disciplines, and exploring a new city—PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Madeleine Daepp about the internship experience at Microsoft Research. The post Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas appeared first on Microsoft Research.  ( 29 min )
  • Open

    Brains of the Operation: Atlas Meditech Maps Future of Surgery With AI, Digital Twins
    Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation. Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations. Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI Read article >  ( 7 min )
    Fall in Line for October With Nearly 60 New Games, Including Latest Game Pass Titles to Join the Cloud
    October brings more than falling leaves and pumpkin spice lattes for GeForce NOW members. Get ready for nearly 60 new games to stream, including Forza Motorsport and 16 more PC Game Pass titles. Assassin’s Creed Mirage leads 29 new games to hit the GeForce NOW library this week. In addition, catch a challenge to earn Read article >  ( 9 min )

  • Open

    Ring Attention with Blockwise Transformers for Near-Infinite Context
    submitted by /u/nickb [link] [comments]
    Think before you speak: Training Language Models With Pause Tokens
    submitted by /u/nickb [link] [comments]
    Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/nickb [link] [comments]
    AI has been reading my mind.
    I know several people that tell me whenever they say something out loud, they start seeing it advertised to them or on their feed. But for me, if I think of certain things, even if I never said it out loud, it will appear on my feed.. has anything similar been happening to anyone else? submitted by /u/GuaranteedBigBoy [link] [comments]
  • Open

    [P] Open-source project to run locally LLMs in browser, such as Phi-1.5 for fully private inference
    Excited to introduce BlindChat (https://github.com/mithril-security/blind_chat), an open-source, privacy-centric alternative to ChatGPT for in-browser Conversational AI! We provide full local inference in browser, by using libraries from Hugging Face like transformers.js or candle for WASM inference. We have supported several small models, the latest one being Phi-1.5, the 1.3B model that beat Llama 2 7b! As Microsoft’s researchers mentioned in their paper, the model often produces incorrect code and statements. They are just suggestions, and this model is not trained for instruction tuning, so it might be harder to use than regular chat. More info on their model card (https://huggingface.co/microsoft/phi-1_5). We would love to have your feedback on our project, as we are aiming to build a privacy-first and open-source alternative to ChatGPT! submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] What is the relation between learning rate and vanishing gradient problem?
    How can we tackle vanishing gradient problem by changing the learning rate? Is it possible? submitted by /u/InternationalBack472 [link] [comments]  ( 9 min )
    [P] Torchsummary not working with your layers again? Try this lightweight alternative
    pip install output-shape It is a minimalistic and simple alternative to torchsummary with a simple print of the output shape of a layer, or custom layer. For torch.nn.MultiheadAttention, it handles both the output shape and the attn matrix separately. https://github.com/avocardio/output-shape Currently only works with PyTorch models, soon with Tensorflow / Keras as well. Jax is also on the list for later! submitted by /u/capital-man [link] [comments]  ( 9 min )
    [D] Thoughts on current Vector DB landscape?
    Hello, What are your thoughts on current Vector DB offerings? For instance: Do you think the pricing for them is reasonable/viable? Do you think there’s a sufficient level of developer/user experience? What about for those who aren’t necessarily specialized in data? If you like a managed service, why do you prefer it over the open source alternatives? submitted by /u/LucasSaysHello [link] [comments]  ( 9 min )
    [R] NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions
    Project Page Paper Code We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at continuous query points. The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals. Our method instead builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. To further improve the channel-wise capacity of radial basis functions, we propose to compose them with multi-frequency sinusoid functions. This technique extends a radial basis to multiple Fourier radial bases of different frequency bands without requiring extra parameters, facilitating the representation of details. Moreover, by marrying adaptive radial bases with grid-based ones, our hybrid combination inherits both adaptivity and interpolation smoothness. We carefully designed weighting schemes to let radial bases adapt to different types of signals effectively. Our experiments on 2D image and 3D signed distance field representation demonstrate the higher accuracy and compactness of our method than prior arts. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [D]
    Hi guys ! I am going to purchase a laptop for programming and AI tasks. I will be working on a simulation software project related to the trajectory of an object in 2d and 3d space. Which laptop will be the most suitable for these tasks and it should have high battery backup because the place where I work does not have enough power sockets. The first laptop which came into my mind was Macbook pro with M2 pro chip and Lenovo Thinkpad X1 Carbon gen 10. Suggest me the best. submitted by /u/smitherium [link] [comments]  ( 9 min )
    [Discussion] Feature Selection Algorithms
    I have only 200 samples but about 30 features. What are some effective commonly used feature selection algorithms? I want to identify the features that play the most significant role in generating outcomes. submitted by /u/Shina-pig [link] [comments]  ( 9 min )
    [R] Will a small error be determining in the final decision for my paper?
    About a week ago, I submitted my first paper into one of the most prestigious Machine Learning conferences out there. This was a last minute submission, and my supervisor and I were working on it simultaneously until the very last moment. Sadly, my supervisor committed an error when writing the mathematical definition of a certain set, slightly changing its meaning. This change, even though small, changes the definition in such a way that the subsequent theorem and its proof isn't formally correct anymore, as it assumes the original definition of the set, not the new one. How much will this affect the decision of accepting or rejecting my paper? The whole method, results and consequences are still the same, no matter this definition. It's more a problem of a "formal" nature (here "formal" as a word in the mathematical sense). Is there a other way that I can inform about this error without changing the content maybe? I know that at some point, they give a chance to edit the original paper, but I don't know if this is after the decision to accept/reject. submitted by /u/howtorewriteaname [link] [comments]  ( 9 min )
    How can I apply object detection and image segmentation functionality to my current custom-trained Image Classification model? [D]
    So, a few months ago, I started developing this deep learning model, which was made purely to differentiate whether the input image is driftwood floating in water or a crocodile. To my knowledge, I leveraged the resnet50 pre-trained SoTA model to train my deep learning model, and for that, I downloaded almost 5k images of driftwood and crocodiles for my model training. Once the training was complete, I took the next step and deployed my model on the Hugging Face Spaces app, allowing my friends to put it to the test. But here's where I ran into a significant challenge: users could even upload their own selfies, and my model would attempt to predict whether they were a crocodile or a piece of driftwood! So how can I leverage object detection or the image segmentation pipeline so that when the user inputs their image, it tries to detect the object from the photo and then detect whether the detected object from the given image contains a crocodile or not? If the crocodile or driftwood is not found then it should return "No object found" or like that. submitted by /u/meWhoObserves [link] [comments]  ( 9 min )
    [R] Large Language Models Represents Space and Time
    Paper - https://arxiv.org/abs/2310.02207 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [R] Help Shape the Future of Machine Learning: Take Our Short Survey and Let's Create Something Amazing Together!
    Hello Redditors in r/MachineLearning We are the team behind ML Workbench, an upcoming integrated platform designed to streamline your entire machine learning lifecycle. From data preprocessing and model training to validation and deployment, we aim to make the process as seamless as possible. But here's the thing: we need your insights to build something that truly resonates with the community and solves real-world problems. 📝 Click Here to Take the Survey Why Should You Care? Unified Experience: Imagine managing all your ML tasks in one integrated environment. High-Performance Computing: We're leveraging powerful A100 GPUs to accelerate your work. User-Centric Design: Whether you're a beginner or a pro, the platform is designed to cater to all skill levels. Collaboration: Built-in features to make team collaboration effortless. What's in the Survey? The survey contains questions about your current challenges, the tools you use, and what you'd love to see in an ML platform. It should only take about 5-10 minutes to complete. Thank You Gift As a small token of our appreciation, we're offering exclusive early access to the platform for selected participants. Don't miss this chance to be among the first to experience what we're building! 📝 Click Here to Take the Survey Your feedback is crucial for us to create a tool that we hope will make a significant positive impact in the machine learning community. Thank you for taking the time to read this post and participate in our survey. Cheers, The ML Workbench Team submitted by /u/nonononottodayorever [link] [comments]  ( 9 min )
    [P] Video Event Detection
    Hi, I'm looking to create a model that given a sequence of frames from a video, returns a probability distribution over a set of events that may have occurred in those frames (probably 5 - 10 events). The training data will consist of video and hand labelled frame index/event pairs. I'm not too concerned about handling simultaneous events. It would be super helpful for some suggestions on a model architecture that would yield the best results and/or good papers/examples that achieve something similar. Thanks! submitted by /u/Dredgefort [link] [comments]  ( 9 min )
    [P] Retrieval augmented generation with OpenSearch and reranking [Video tutorial]
    I created a video tutorial that tries to demonstrate that semantic search (using embeddings) is not always necessary for RAG (retrieval augmented generation). It was inspired by the following Cohere blog post: https://txt.cohere.com/rerank/ I code up a minimal RAG pipeline: OpenSearch -> Rerank -> Chat completion (without using Langchain or similar libraries) and then see how it performs on various queries. Hope some of you find it helpful. Feel free to share any feedback@ Video link: https://youtu.be/OsE7YcDcPz0 submitted by /u/mildlyoverfitted [link] [comments]  ( 9 min )
    [R] Hacking an NLP benchmark: How to score 100 points on AMR parsing
    AMR parsing is a fun task where researchers map texts onto little graphs that explicate their meaning, so called Abstract Meaning Representations (AMRs). While arguably not the top NLP benchmark regarding popularity, research has been active for the last 10 years, including at major NLP conferences such as ACL/NAACL/EACL/EMNLP etc. Funnily, I recently found some vulnerabilities in the evaluation protocol, and if we exploit these vulnerabilities, we can get the highest score on the benchmark. To get an overview over the issue (without understanding AMR), imagine a cooking contest that takes place regularly, say, once a year. In all events, we have the same judge, participants are amateurs, meals are scored on 0 to 100, with 100 meaning “it can’t possibly get better”. Over the years, the …  ( 10 min )
    [D] Looking for an article related to machine learning in medicine to be presented at a journal club
    Hi all, I'm curious if anyone has a stand-out article they believe would prompt a lively discussion in a journal club I have coming up. Something that may have people take sides, or maybe a recent breakthrough in the ML space as it relates to clinical/health care. ​ Thanks! submitted by /u/veilofosiris [link] [comments]  ( 9 min )
    [R] Think before you speak: Training Language Models With Pause Tokens
    Paper - https://arxiv.org/abs/2310.02226 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some object detection models which support ensembling out of the box like Yolov5?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    [P] Using pre-trained models as features?
    Hey everyone! Currently, I am working on a project around music emotion classifcation/regression model. Basically I am trying to predict a score to each emotion on a given song. The problem is that my dataset has quite imbalanced scores (y). Most scores are centered around a certain score range. Therefore, having difficulties predicting scores that are further away of the mean values. I had this idea to bring in pre-trained (on other datasets and problems) audio classification models into this as there are a bunch of good performing pre-trained classification models out there already. The prediction of these pre-trained models should be used as features (e.g. prediction of genre, instrument etc) beside the original spectorgram in my model. I know this won't solve the problem of imbalances in the scores but I thought maybe this could improve the performance as the model would have more features to work with. Does this make sense? I appreciate any input. submitted by /u/Kniggi [link] [comments]  ( 9 min )
    [D] LOMO underrated
    Does anyone have an idea why the LOMO optimizer (low memory optimizer) which was released a few months ago is not widely available and everyone still uses either Adam or SGD? While the paper looks really promising submitted by /u/RedMoula [link] [comments]  ( 9 min )
    [P] Camera based monitoring of infant's breathing
    Hi! I recently have seen systems that monitor breathing rate of an infant through camera. I have read several articles on that topic, where people used things like 3D camera, RGB or Interferometric Radar Sensor. Do you guys have any idea on how to accurately measure this? submitted by /u/kaina_m [link] [comments]  ( 9 min )
    [R] Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D] How Do You Track Projects in a Scaling ML Team?"
    I am part of a Machine Learning team that has experienced significant growth recently. When we were a small team, tracking projects was straightforward. However, as the team has expanded, it's become increasingly challenging to keep track of everything. We are part of a larger corporation, so we have access to tools for creating epics and boards. However, these corporate tools are too generic and don't provide the level of detail I need for internal management. Specifically, I'm looking for a way to track model versions, dataset versions, and the overall status of our projects. I'd also like to be able to assign team members to projects. Currently, we use a MIRO board, but it's disorganized and difficult to read and update. I'd love to hear what tools or strategies you've used for similar situations, especially since our team is expected to grow even more, making tracking increasingly complex. submitted by /u/Spiritual_Narwhal649 [link] [comments]  ( 9 min )
  • Open

    Lemniscate functions
    In the previous post I said that you could define the inverse sine as the function that gives the arc length along a circle, then define sine to be the inverse of the inverse sine. The purpose of such a backward definition is that it generalizes to other curves besides the circle. For example, it […] Lemniscate functions first appeared on John D. Cook.  ( 5 min )
    Generalized trigonometry
    In a recent post I mentioned in passing that trigonometry can be generalized from functions associated with a circle to functions associated with other curves. This post will go into that a little further. The equation of the unit circle is and so in the first quadrant The length of an arc from (1, 0) […] Generalized trigonometry first appeared on John D. Cook.  ( 5 min )
  • Open

    A Mine-Blowing Breakthrough: Open-Ended AI Agent Voyager Autonomously Plays ‘Minecraft’
    For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Read article >  ( 6 min )
    How AI Helps Fight Wildfires in California
    California has a new weapon against the wildfires that have devastated the state: AI. A freshly launched system powered by AI trained on NVIDIA GPUs promises to provide timely alerts to first responders across the Golden State every time a blaze ignites. The ALERTCalifornia initiative, a collaboration between California’s wildfire fighting agency CAL FIRE and Read article >  ( 6 min )
  • Open

    LLMs May Be The Trojan Horse That Modernizes Software Development
    submitted by /u/geekteam6 [link] [comments]
    Why PepsiCo is powering your snacks with AI
    Using AI to improve Cheetos? That's something PepsiCo has experimented with. On today’s POLITICO Tech, Athina Kanioura, chief strategy and transformation officer for PepsiCo, says that using AI to make employees faster and more efficient hasn’t led PepsiCo to replace human workers as many fear. And why the company has determined that in some jobs the technology is simply off limits. Listen to the interview here: https://politico-tech.simplecast.com/episodes/why-pepsico-is-powering-your-snacks-with-ai submitted by /u/smo279 [link] [comments]
    New Paper: Enabling Language Models to Implicitly Learn Self-Improvement From Data
    LLMs keep getting more capable at generating natural language. But there's always room for improving the quality and alignment of their responses. Typically this requires lots of human effort to collect more training data. So researchers are exploring ways for models to self-improve without human involvement. Many methods use prompting - giving the LLM instructions to critique and refine its responses. But coming up with comprehensive prompts is challenging. The new approach proposed, called PIT, lets models learn self-improvement implicitly from human preference data instead. It reformulates reinforcement learning to maximize the gap between an original response and improved response conditioned on the original. This taps into the implicit guidance in the preference data on what constitutes better quality, so no manual rubrics are needed. PIT uses curriculum reinforcement learning - first improving easy references, then switching to the LLM's own samples. Experiments on real and synthetic datasets show PIT significantly outperforms prompting methods like Self-Refine. It improved response quality 7-34% across conditions without any human involvement. This demonstrates a promising direction for LLMs to align better with human preferences autonomously as they learn from experience. No need for human bottlenecks when expanding to new domains or underserved use cases. Very cool! TLDR: New method PIT enables LLMs to implicitly learn to refine themselves from human preference data, no prompts needed. Big improvement over prompting approaches. Full Summary Arxiv is here: https://arxiv.org/abs/2310.00898 submitted by /u/Successful-Western27 [link] [comments]
    $5k in grants or $250k funding for AI startups. Backed by OG's
    AI Grant is offering $5k in grants or $250k in funding for AI startups. The program is backed by OG's AI Grant, an accelerator for AI startups. The grant includes an uncapped SAFE investment of $250,000 for AI-native product startups, $350,000 in Azure credits, a summit in San Francisco with advisors and founders, and various other startup benefits and credits. The program was created by Nat Friedman and Daniel Gross. Applications for Batch 3 will open in a few months, but early applications are accepted. The program is open to anyone, and it is looking for companies or projects that leverage AI models in a useful or engaging way. Source : https://aigrant.com/ submitted by /u/NuseAI [link] [comments]
    AI will teach everyone to read and write. It's already begun.
    https://www.imagineworldwide.org/ "What is Child-Directed, Tech-Enabled Learning? Children drive their own learning, at their own pace, using software that provides a complete, research-based curriculum and pedagogy. Adults play a supportive, facilitative role. The software is delivered to the learner on a tablet, without connectivity, and charged by solar power or other appropriate energy sources... With hundreds of millions of children out of school or lacking access to effective schooling, this model can provide every child, everywhere access to learning. Solutions can work without internet access or grid power. Adults play facilitative, rather than instructional, roles. The annual unit cost of the learning solution is less than $7 per child and declining. This includes hardware, software, accessories, power, shipping, and implementation support from Imagine." submitted by /u/Georgeo57 [link] [comments]
    AI is replacing customer service jobs across the globe
    Artificial intelligence (AI) is replacing customer service jobs around the world, with chatbots being used to interact directly with customers and solve problems independently. This shift is expected to have a profound effect on economies, particularly in countries like India and the Philippines where call centers provide millions of jobs. While some argue that AI will provide support to remaining call center workers and improve job satisfaction, others warn that it could lead to job losses and a need for workforce adaptation. The use of AI software tools in call centers has shown potential for improving productivity and customer satisfaction. Source : https://www.washingtonpost.com/technology/2023/10/03/ai-customer-service-jobs/ submitted by /u/NuseAI [link] [comments]
    Female-founded AI startups win just 2% of funding deals in UK
    Female-founded AI startups in the UK account for just 2% of funding deals over the past decade, according to a report by the Alan Turing Institute. When female-founded companies do secure funding, they raise an average of £1.3m per deal, compared to £8.6m raised by all-male founder teams. The report highlights the urgent need for gender balance in AI investment, as the industry is predicted to grow significantly in the coming years. Recommendations to improve gender balance include improving recruitment, monitoring investment practices, and diversifying the ecosystem. There is an increasing demand for generative AI products, with leading tech companies investing heavily. Gender diversity gaps and uneven progress rates for ethnic and racial groups are observed across investment firms. AI products have shown biases, such as passport checkers working less efficiently with darker skin and tools reinforcing gender stereotypes. In 2019, a UN agency found that assigning female genders to digital assistants like Siri and Alexa perpetuated harmful gender biases. Source : https://www.theguardian.com/technology/2023/oct/04/female-founded-ai-startups-win-just-2-of-funding-deals-in-uk submitted by /u/NuseAI [link] [comments]
    I used Riffusion (Stable Diffusion, but for music) to turn my own music into "jazz", "Radiohead", "Muse" or "Nirvana" songs, I'm amazed by the results
    submitted by /u/cI_-__-_Io [link] [comments]
    Visa Announces $100 Mn Fund for Generative AI Companies
    submitted by /u/Agitated-Spell3979 [link] [comments]
  • Open

    My Impressions (and Application) of the Heidelberg Laureate Forum 2023
    This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year! The post My Impressions (and Application) of the Heidelberg Laureate Forum 2023 appeared first on David Stutz.  ( 7 min )
  • Open

    Simplify medical image classification using Amazon SageMaker Canvas
    Analyzing medical images plays a crucial role in diagnosing and treating diseases. The ability to automate this process using machine learning (ML) techniques allows healthcare professionals to more quickly diagnose certain cancers, coronary diseases, and ophthalmologic conditions. However, one of the key challenges faced by clinicians and researchers in this field is the time-consuming and […]  ( 11 min )
    Create an HCLS document summarization application with Falcon using Amazon SageMaker JumpStart
    Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are […]  ( 9 min )
    Automate prior authorization using CRD with CDS Hooks and AWS HealthLake
    Prior authorization is a crucial process in healthcare that involves the approval of medical treatments or procedures before they are carried out. This process is necessary to ensure that patients receive the right care and that healthcare providers are following the correct procedures. However, prior authorization can be a time-consuming and complex process that requires […]  ( 7 min )
  • Open

    Scalable spherical CNNs for scientific applications
    Posted by Carlos Esteves and Ameesh Makadia, Research Scientists, Google Research, Athena Team Typical deep learning models for computer vision, like convolutional neural networks (CNNs) and vision transformers (ViT), process signals assuming planar (flat) spaces. For example, digital images are represented as a grid of pixels on a plane. However, this type of data makes up only a fraction of the data we encounter in scientific applications. Variables sampled from the Earth's atmosphere, like temperature and humidity, are naturally represented on the sphere. Some kinds of cosmological data and panoramic photos are also spherical signals, and are better treated as such. Using methods designed for planar images to process spherical signals is problematic for a couple of reasons. Firs…  ( 92 min )
  • Open

    Why DQN method is only suitable for small discrete action space? What is the issue if action space is large and continous?
    submitted by /u/aabra__ka__daabra [link] [comments]
    Up to date Metaworld documentation
    Hello everyone, I want to start experimenting with the domain of multi-tasking and meta-learning, thus I pip installed metaworld which is currently on version 2.0.0 if I'm not mistaken. I wanted to ask in case anybody knows, if there's any recent updated documentation, because the farama foundation on GIthub which is probably responsible for maintaining the Metaworld, has outdated code and documentation. (for example, presented code on Github's README has the command env.step(a) which returns 4 values instead of 5 that newer version outputs). From what I understand, they gather contributors for a big push regarding code and documentation on GItHub, where they will make up things up to date again but this announcement was 7 months ago. Sorry for the potentially wrong format of this question-post, I'm relatively new to reddit. I would appreciate any further knowledge on this topic and thanks everyone who's taking the time to read it! ​ Metaworld Distribution from Farama Foundation on Github: https://github.com/Farama-Foundation/Metaworld submitted by /u/South_Book_5625 [link] [comments]
    The future of game testing is here, and it is powered by Artificial Intelligence! 🔥
    Hi everyone! We used our opensource library SheepRL 🐑 and our PyTorch implementation of DreamerV3 on Crafter, an open-world survival game, featuring randomly generated 2D worlds, in which players have the freedom to explore a large and expansive map and need to forage for food, collect materials, build tools and find shelter. Here is a short video 👉 https://youtu.be/7XEBT2msUUQ In open-world games, ensuring they are playable and bug-free is crucial, but is becoming increasingly difficult and time-consuming using manual game testing. Maximizing exploration using Reinforcement Learning is extremely useful for testing games at scale, because of the wide variety of gameplay scenes the player may encounter. Why is the test on Crafter so interesting for game testing? Because Crafter evaluates a large number of general capabilities related to the RL agent, like strong ability to generalise (new generated maps for each episode), to deal with partial observability (each input image reveals only a small part of the world) and to long-term reasoning and survival. These abilities are very useful for testing games at scale, providing developers with insights to optimise gameplay and player experience. The future of game testing is here, and it is powered by Artificial Intelligence! 🔥 --- ❌ Are you interested in joining the project community? Get in touch 👉 https://github.com/Eclectic-Sheep/sheeprl ❌ SheepRL 🐑 is open-source, fully written in PyTorch and accelerated with LightningFabric - by Lightning AI. Feel free to use it for your AI projects, and if you want to contribute, we are more than happy to accept your pull requests! ❤️ submitted by /u/Manu_Orobix [link] [comments]
    Can I use Continuous algorithms (e.g. TD3) for Discrete Action spaces?
    My environment has hybrid action spaces and I was wondering if I can use continuous algorithms for discrete action spaces. I'm asking this because, well, agent can't learn and I'm trying to find the source of error. I was wondering if this was the source of problem. ​ My Assumptions On Solving This Problem: - Discrete is subspace of continuous, thus continuous algorithms will be able to handle discrete action spaces as well. - A non-hybrid action-space algorithm will be simpler than hybrid-action-space algorithms. ​ Method (I'm only describing the discrete action here): - Use TD3 as the training algorithm. No modification from the original training code. TD3 algorithm has been verified on Pendulum and other environments created for unit test purposes. - Policy network outputs the a…

  • Open

    Video Game Voice Actors Are Ready to Strike over AI. Here’s Why
    Video game voice actors are prepared to go on strike over the use of AI in game development. The current contract negotiations between the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) and video game companies have stalled, with the major issues being pay raises and the use of AI to alter or generate actors' performances. SAG-AFTRA wants protections for its members to ensure their work is not stolen or replaced by AI. If negotiations don't progress, voice actors, stunt artists, and motion capture performers could potentially go on strike, leading to delays in game releases and recasting of beloved performers. The voice actors' strike in 2016 resulted in improvements to pay, and now they are prepared to strike again to fight for their rights. Video game performances are often seen as assets to be extracted and inserted into games, rather than recognizing the humanity and quality of life of the performers. The use of AI in game development raises concerns about how companies will use advances in generative AI to steal work or put performers out of a job. SAG-AFTRA wants transparency, consent, and compensation when it comes to the use of AI in games. Members of SAG-AFTRA have voted in favor of authorizing a strike, meaning voice actors, stunt artists, and motion capture performers could potentially join the picket line if negotiations don't progress. The strike could lead to delays in upcoming game releases and the recasting of performers if companies refuse to meet the union's demands. The fight for voice actors' rights is an existential one, as they want to retain the rights to their own voices and images and achieve wages that keep up with inflation Source : https://kotaku.com/sag-aftra-strike-voice-actor-spider-man-ai-union-1850874117 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    [Question] Any 3X AI?
    Wanted to see if there are any 3X AI generated images available? I’m looking to see how I could use AI to generate images for my website. submitted by /u/IamMoe8868 [link] [comments]  ( 8 min )
    TikTok ran a deepfake ad of an AI MrBeast hawking iPhones for $2
    TikTok ran an ad featuring a deepfake of MrBeast offering iPhone 15 Pros for $2. AI-generated deepfake content is becoming more pervasive on social media platforms. Platforms like TikTok are facing challenges in moderating and handling the rise of AI deepfakes. MrBeast raised concerns about the ability of social media platforms to handle AI deepfakes. TikTok removed the ad and associated account for policy violations. Unauthorized AI-generated content featuring celebrities is a growing problem in platform advertising. The issue is expected to worsen as AI technology improves and becomes more accessible. Transparency and disclosure are crucial in AI-generated ad content featuring celebrities. TikTok is aware of the pervasiveness of AI-generated content on its platform and is taking steps to address it. Source : https://www.businessinsider.com/tiktok-ran-deepfake-ad-mrbeast-as-ai-generated-content-spreads-2023-10 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Infinitia will apparently let you create your own AI enabled social simulations
    Came across this upcoming game which supposedly let's you create your own worlds and characters to live in the world...they also released a research paper explaining how they're doing it, using LLMs in all sorts of ways, primarily for reasoning and language. I think it could be a pretty fun take on passive games, just populating a world with your characters, checking up on them occasionally, putting them in weird situations lol. infinitia.ai for those who wanna check it out The NPCs do seems to be acting in an interesting way, as i saw in this video they posted on twitter... https://twitter.com/infinitia_app/status/1707102187518628245 ​ Watchall think? Another smallville clone? or something interesting.... submitted by /u/SeaJeweler3723 [link] [comments]  ( 9 min )
    Efficient AI design of robots.
    submitted by /u/DrJosh [link] [comments]  ( 8 min )
    From Stone to Silicon: The Odyssey of Humanity and Technology
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Don't Worry, AI Cannot Takeover the World, It Will Run Out of Battery
    The article discusses the importance of batteries in AI technology and how they limit the capabilities of AI robots. It explores the challenges of current battery technology and the need for better solutions. The article emphasizes the significance of developing ideal batteries that can provide long-lasting power without degradation. Source : https://notes.arkinfo.xyz/p/dont-worry-ai-cannot-takeover-the submitted by /u/NuseAI [link] [comments]  ( 9 min )
    GPT-4 outperforms its rivals in new AI benchmark suite GPT-Fathom
    ByteDance and the University of Illinois researchers have developed an improved benchmark suite with consistent parameters, called GPT-Fathom, that indicates GPT-4, the engine behind the paid version of ChatGPT, significantly outperforms leading LLMs, including its biggest competitor, Claude 2. For the latest advancements in AI, look here first. ​ https://preview.redd.it/v4fo8zser0sb1.png?width=1292&format=png&auto=webp&s=7e29fe9ac1af3efcb936ee61e9202717eed7e702 GPT-Fathom's breakthrough The new benchmark suite, GPT-Fathom, addresses consistent settings issues and prompt sensitivity, attempting to reduce inconsistencies in LLM evaluation. In a comparison using GPT-Fathom, GPT-4 outperformed over ten leading LLMs, crushing the competition in most benchmarks, and showing significant performance leaps from GPT-3 to its successors. Performance specifics The gap in performance was especially pronounced against Claude 2, ChatGPT's biggest rival. GPT-4's Advanced Data Analysis model exhibited superior performance in coding, giving it an edge as compared to LuckLlama 2, the current best-performing open-source model. Llama 2-70B showed comparable or better performance than gpt-3.5-turbo-0613 in safety and comprehension but displayed worse performance in "Mathematics", "Coding", and "Multilingualism". The seesaw effect The research team noted a 'seesaw effect' where an improvement in one area can lead to degradation in another. For instance, GPT-4 saw a performance drop on the Mathematical Geometry Simple Math (MGSM) benchmark, despite improving its performance significantly on the text comprehension benchmark DROP. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Runway has launched Gen 2 Director mode. The speed at which this company works is Insane
    submitted by /u/First_Development101 [link] [comments]  ( 8 min )
    How Ai Could Upend Foreign Policy - An Interview with Ian Bremner and Mustafa Suleyman
    submitted by /u/finger_puppet_self [link] [comments]  ( 8 min )
    CGPT-4 on some ways "prompt/response" posts advance AI
    Prompt: Explain why posting prompts and your responses speeds up AI development in the following ways: It allows us to analyze responses to prompts that we would probably never have thought of. It allows us to share ideas that the open source community would advance. It allows us to better democratize the process of alignment, bringing in as many people as possible. CGPT-4: Posting prompts and AI responses serves multiple functions that are beneficial to the progress of AI development. First off, it exposes AI to a myriad of queries that developers might not have otherwise considered. This wide range of prompts allows the machine to improve its natural language processing abilities, making it more versatile and effective in interpreting and responding to human language. The more dive…  ( 9 min )
    A.I Makes a Video game on the App Store
    submitted by /u/usmansid98 [link] [comments]  ( 8 min )
    Infinite context windows? Streaming LLMs can be extended to infinite sequence lengths without any fine-tuning.
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Where do I produce free intro and outro AI music for my Podcast for free.
    I am starting a podcast on Psychology and Philosophy submitted by /u/21bce [link] [comments]  ( 8 min )
    BackerKit Will Restrict the Use of AI Art
    Crowdfunding site BackerKit has announced a new policy that restricts the use of solely AI-generated content on its platform. The policy aims to address concerns regarding ownership of content, ethical sourcing of data, and compensation for the process of creating content. Projects that lack a minimum requirement of human input will not be allowed to crowdfund on the BackerKit site. There is some flexibility with AI generative fill and the use of AI transcription services, but a high level of human input is required to satisfy the policy. BackerKit will automatically exclude all content uploaded by creators for their projects from AI training in support of this policy. The new restrictions will go into effect on October 4, giving creators time to alter their projects if they are using AI-generated images and text. Source : https://gizmodo.com/backerkit-ai-art-new-policy-crowdfunding-generative-1850891882 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/2/2023
    iPhone designer Jony Ive is reportedly talking to OpenAI CEO Sam Altman about making an AI hardware device.[1] Visa announced today that it plans to invest $100 million in companies developing generative AI technologies and applications “that will impact the future of commerce and payments.”[2] More than 40% of labor force to be affected by AI in 3 years, Morgan Stanley forecasts. [3] Tom Hanks: Don't fall for "AI version of me" promoting dental plan.[4] Sources: [1] https://www.businessinsider.com/chatgpt-head-iphone-designer-jony-ive-ai-device-openai-report-2023-9?amp [2] https://techcrunch.com/2023/10/02/visa-earmarks-100m-to-invest-in-generative-ai-companies/ [3] https://www.cnbc.com/2023/10/02/more-than-40percent-of-labor-force-to-be-impacted-by-ai-in-three-years-morgan-stanley-forecasts.html [4] https://www.cbsnews.com/amp/news/tom-hanks-ai-version-of-me-promoting-dental-plan/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    [D] What are some effective dimensionality reduction (unsupervised feature selection) techniques for a high dimensional, sparse dataset?
    I am considering comparing mutual information scores, but I also don't think I understand MI well enough. For example, I(X;Y) = H(X) + H(Y) - H(X,Y). To me, visualizing H(X) and H(Y) as venn diagrams and H(X,Y) as the information from both X, Y (like an overlapping venn diagram) makes me think that when X, Y are disjoint, then MI is 0 and when X, Y overlap completely, then the MI score will be high. So, I'm thinking that a high MI value is "bad" since this means X, Y would be redundant. I am not sure if my understanding here is correct. Another method I have tried is to binarize the data for each feature (represented as rows in my dataset) using "present" (1) and "absent" (0). The main issue I have run into doing this is that I am trying to then create a distribution to compare the fea…  ( 10 min )
    [D] Best interface to use LLMs for code: Chat or completion?
    Hi everyone, I am quite interested in understanding what are the feedback from the community in terms of interface to leverage LLMs for code productivity. Because LLMs tend to do mistake I have mostly used Chat-like interfaces, like ChatGPT, as they allow to interact with the model and converge to a conclusion. I haven't used Copilot for a while but my feeling was that it could do some boilerplate correctly but then it quickly started suggesting code that would be misleading and could actually hurt productivity. It might have changed since then but that was my feeling back then. What is your favorite option and why? View Poll submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] ML input data has to be derived from a larger dataset
    Hello everyone. I am curious to know if anyone has encountered a ML problem like this and if so, I seek your advice. Usually in ML classification such as the IRIS dataset, each row represents a sample and each column a parameter, right ! My problem is that my ML classification parameters have to be derived from a range of values (parent data). I have taken mean of the parent values to generate the parameters for the ML input data. This results in lower classification accuracies using Random forest and XGBoost. Has anyone encountered a similar situation like this where the data has to be generated from a range of other datasets? Is there any other way to do this? I did not find any papers or articles from the web so just asking. I can generate additional parameters from other statistics such as median, standard deviation etc. which can improve the classification accuracy but can make interpretation of the results a little weird, domain wise. I wish to avoid this if possible. submitted by /u/notmyfault7676 [link] [comments]  ( 9 min )
    [D] Book review for Meta's ML Design interview? Machine Learning System Design Interview (by Ali Aminian and Alex Xu)
    I'm preparing for the ML system design interview for Meta, and I searched for various resources. This book (ML System Design Interview (by Ali Aminian & Alex Xu)) seems like a solid structured resource that covers solutions to case studies in detail. Has anyone used it to prepare for Meta's ML System Design interview? Thoughts? Khang's book doesn't seem to have great reviews. Chip Huyen's book (Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications) doesn't seem very focused on interview prep?? Also, happy to hear about other cool resources to prepare. Thanks very much! submitted by /u/irEFrienfk [link] [comments]  ( 9 min )
    [R] Open X-Embodiment: Robotic Learning Datasets and RT-X Models - DeepMind 2023 - RT-X exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms!
    Blog: https://www.deepmind.com/blog/scaling-up-learning-across-many-different-robot-types https://robotics-transformer-x.github.io/ here you can also find the Datasets and Code! Paper: https://robotics-transformer-x.github.io/paper.pdf Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train “generalist” X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. https://preview.redd.it/oxzutrhtb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=37b8b1dbf5f489dc2c8eaca4d15cb9c32ebc2660 https://preview.redd.it/ldsiwshtb1sb1.jpg?width=1494&format=pjpg&auto=webp&s=fdbf0f91c705acf11bff854f6d6af82dddd47021 https://preview.redd.it/ikk18jitb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=e50b443dc4b0266a0480d54c4f92a0b708485797 https://preview.redd.it/t5wmciitb1sb1.jpg?width=1361&format=pjpg&auto=webp&s=2971fd645acb6dcbed2ca3522e311d0772c45964 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Biggest problems with ML in industry?
    For all my corporate ML engineers I have a question, what are the most annoying / biggest problems you face when developing/deploying ML in industry? This can be anywhere from data, to tuning, to even MLOPS. submitted by /u/hai_cben [link] [comments]  ( 9 min )
    [D] Difficulty with paper implementations on google colab
    I am not from CS background, my knowledge is from online courses and books. All of which used some variation of Jupyter notebook. My knowledge of code can be lacking sometimes, since I am not from CS background. I am trying to implement some computer vision paper codes on newer samples. I understand the papers, and the underlying mechanisms. However, I fail to decipher the codes provided with the associated github repository. Usually, these repository contains information on how to recreate the experiment on some specific data using shell. But I am using google Colab for this purpose, as I don't have access to GPU, and I found it impossible to recreate the experiments in the google Colab, using shell commands, let alone extend it to newer samples. I would appreciate some help in this regard, I haven't done this before, and there aren't really any tutorial/resource on how to do this. Ideally, what I am trying to do is separate the model, input some images, get the output, and interpret it. I am stuck, and I would really appreciate some help or advice in this regard. Right now I am trying to work with this paper, meta ood I would appreciate any help/advice/resource anything. I feel very lost. Thanks in Advance. submitted by /u/franticpizzaeater [link] [comments]  ( 9 min )
    Repurposing a personal desktop computer [P]
    Hello! I'm debating turning my old desktop (old CPU but relatively new GPU 3980 or 90) into a ML box that I can remote into. I'm sure people here have done something similar and I was wondering if anyone could point me towards some resources for getting it off the ground/any pitfalls to avoid/suggestions. I'm an active data scientist researcher for my job and this would just be for fun side projects but I have some pretty glaring holes in my knowledge of computers (like the best way to set this up - should I uninstall windows install unbuntu or is windows fine?) Honestly I'm sure my ignorance will be pretty apparent from the questions I'm asking/not asking so any advice anyone has would be welcome! Thanks! Sorry if this is the wrong subreddit for this sort of thing. ​ submitted by /u/shebaiscool [link] [comments]  ( 9 min )
    [R] Generative memory: generative diffusion models are equivalent to modern Hopfield nets
    https://arxiv.org/abs/2309.17290 submitted by /u/LucaAmbrogioni [link] [comments]  ( 8 min )
    [D] Stuck in Automation of AI models
    Hello everyone! ​ I'm currently working on a project and have hit a roadblock in automating the deployment of my machine-learning models. Can anyone provide guidance on the best practices or tools for streamlining the deployment process? Specifically, I'm looking to create a seamless workflow where models can be easily uploaded, deployed on the cloud, and accessible through APIs. Any insights or advice would be greatly appreciated! ​ Automation!!! submitted by /u/homelander81 [link] [comments]  ( 9 min )
    [P] The Case of the Missing Masterpiece
    Hi, I just wanted to share an applied image classification problem that I worked on a few years ago: https://vdalv.github.io/2018/09/01/missingMasterpiece.html submitted by /u/vdalv [link] [comments]  ( 9 min )
    Need to build a XAI model to explain the behaviour of an IDS [P]
    Hello, I need help from someone that knows about XAI. I have to create a XAI model to intérprete the resulta of an AI model, an MLP, that works as an IDS classifier. I have no idea on how to do It and I have been completely blocked for 2.5 years. This is the final project of my career and I just don't know how to do It, and my tutor isn't very helpful. If anyone is able to help I would explain him what I have to do and would be very grateful. Thanks for your help submitted by /u/elMandarine [link] [comments]  ( 9 min )
    [D] Optimal scheduling tool with AI/ML recommendations
    Hello all, I'm trying to plan out for a new web platform development for workforce management but have little experience. We all know that hard coding can be done for general scheduling, including manager polling shifts based on labor category, staff assignments, conflt resolving, emergency scheduling, etc. But what I want to research to is....how can I ensure that one optimal schedule is automatically computed using AI/machine learning tools so that I don't have to go through the list of hard-coded generated schedules (I’m sure these will work fine, but still want to compute one ultimate schedule). submitted by /u/Playful-Bed-2183 [link] [comments]  ( 9 min )
    [R] Break-A-Scene: Extracting Multiple Concepts from a Single Image
    ​ Break-A-Scene: Given a single image with multiple concepts, annotated by loose segmentation masks, our method can learn a distinct token for each concept, and use natural language guidance to re-synthesize the individual concepts or combinations of them in various contexts. Project Page: https://omriavrahami.com/break-a-scene/ Code is publicly released! Abstract Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images with variations in backgrounds and poses, and struggle when adapted to a different scenario. In this work, we introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method. ​ submitted by /u/sgd_is_all_you_need [link] [comments]  ( 9 min )
    [R] MIT, Meta, CMU Researchers: LLMs trained with a finite attention window can be extended to infinite sequence lengths without any fine-tuning
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Really good dataset for a Course Capstone
    Hey everyone! My friends and I are taking a Data Science course in our university. We are modestly versed in ML/DL techniques, and want to use everything we know on a really good capstone project for this course. We are looking for a dataset where we can demonstrate a nice variety of techniques to really blow the socks off our Professor. Ideally we'd like this to be stemming from something basic that most would consider "Data Science", as in something with a tabular dataset and elements of classification. Though we still want chances to bring in what we know from outside the course: for example, if there's images to supplement the dataset we could use Image Classification models or something multimodal to bring in more features, if there's natural language data then we could use LLMs to extract salient features etc. More importantly though, we want something whose exploration can be really motivated so it doesn't seem we're only in it for the ML aspect. Thank you! submitted by /u/Subject-Revolution-3 [link] [comments]  ( 9 min )
    [D] Competitiveness in ML research
    I've been diving deep into the world of machine learning research, and I'm genuinely baffled: how on Earth do some researchers seem to pump out paper after paper? I mean, there's only 24 hours in a day, right? Are academic minions (i.e. PhD students) doing all the heavy lifting? Or maybe some highly efficient workflows I'm not privy to? On a more serious note, I would like a career in ML, and the sheer volume and pace of these publications is making me feel a bit disheartened. How is this prolificity possible? Any words of encouragement or advice? submitted by /u/blabboy [link] [comments]  ( 9 min )
    [D] Why should I use a hosted/cloud VectorDB solutions over a serverless or vector store?
    Why the hell should i use cloud based or server hosted solution over a easy peasy servless variant like lancedb or even faiss vector store is enough for most of the use cases on small-medium I often see posts like "oh my stack is... pinecone Chroma weaviate_io" And they just ingest minisets of data, what the hell man submitted by /u/Dear_Bullfrog193 [link] [comments]  ( 9 min )
    [P] FontoGen: generating true-type fonts
    I'd like to share a project that I've spent a few weekends working on. FontoGen is an autoregressive encoder-only transformer model that's capable of generating true-type fonts. GitHub: https://github.com/SerCeMan/fontogen Weights: https://huggingface.co/SerCe/fontogen Blog post with more details: https://serce.me/posts/02-10-2023-hey-computer-make-me-a-font The project is largely an exploration of whether generating fonts natively, line by line, is possible. I'm not aware of any previous research that would achieve the same results for complete fonts previously. This is my first ML-specific project, and I would appreciate any feedback on the model architecture, and I'm also happy to answer any questions you may have. submitted by /u/SerCeMan [link] [comments]  ( 9 min )
    [D] What happens after removing the causal mask of LLaMA?
    The causal mask in LLaMA serves as a protective barrier to prevent information leakage. However, in certain tasks, leveraging information leakage can be a beneficial strategy for enhancing performance, particularly in tasks like token classification, such as Named Entity Recognition (NER). Interestingly, the paper titled "Label Supervised LLaMA Finetuning" (available at https://arxiv.org/abs/2310.01208) reveals a significant performance boost in token classification when the causal mask is removed. submitted by /u/seanlee97 [link] [comments]  ( 9 min )
    [R] RA-DIT: Retrieval-Augmented Dual Instruction Tuning
    New paper that proposes instruction-tuning with in-context retrieval-augmentation to improve SOTA LLMs in cases where access to large, external knowledge sources is needed. Tested on LLaMA 65B, 13B and 7B. https://arxiv.org/abs/2310.01352 submitted by /u/todpole3 [link] [comments]  ( 9 min )
    [D] How do you scale computational intensive Python scripts?
    Hey ML Community, I'm wondering how people currently go about scaling their Python programs? Lets say for instances you're doing batch inference using an LLM. Each prediction takes 2-3 minutes to process, how would you go about scaling that to make a million predictions? I'm asking this question because a few months back I started building a tool to quickly parallelize python functions across thousands of machines in the cloud. I'm focused on making the barrier to interact with the cloud extremely low and want to know all the core alternatives out there. Also, if you have any advice on starting a business I'd love to hear it. submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [D] What is the highest quality automatic image captioning solution?
    I make very high quality Lora's and finetuned stable diffusion models. These models yield very good results, but more importantly they are very easy to use as I have always captioned my images as one would use natural spoken language (no weird booru tags and all that jazz). The most labor intensive processes in the workflow is image captioning. For example, my last project had almost 10000 images in the data set. Every single image was manually captioned by me as the quality of all automated solutions I tried is subpar and has too many accuracy issues. I have tried Blip auto captioning and LLava, but they still were not accurate enough for what I needed. I am hoping someone here can suggest a solution, if one exists, thanks. submitted by /u/no_witty_username [link] [comments]  ( 9 min )
    [D] (Interview Help) Do you know any good resources for interview case studies in the finance domain (especially dealing in loan and credit cards)
    I'm preparing for a data science interview and am looking for case study prep resources, especially for the financial domain (loans and credit cards). Mainly, I want to understand some good metrics for the financial domain, ways to break down the questions and create a rough data model, kinds of conditions to take into consideration (eg. Seasonality), kinds of effects that can be used expected (like opportunities and risks), etc. Any resources or help is greatly appreciated! submitted by /u/how_the_turn_tablez [link] [comments]  ( 9 min )
  • Open

    Help Restricting Actions
    Hello, I am new to RL, I am currently working on a school project that requires it. I am working on making a model to play a game very similar to wordle, so for the function of this post it may as well be wordle. Right now I am trying to get it to work with this gym https://github.com/zach-lawless/gym-wordle, and I will make my tweaks later. This gym has a multi discrete action space, which makes sense to me for a word, IDK if thats best. To validate words, it has its own exception type. I am trying to train this with stable_baselines3, but the exception keeps being raised, since it is trying to guess garbled words like "xcjhr". Is there a way I can validate actions before they are made so the model is restricted to only guessing valid words? Is there a better way to do this? It doesnt need to be the best, it really only has to sorta work. Any help is appreciated, thanks! submitted by /u/ClackHack [link] [comments]  ( 9 min )
    Looking For Advice on Training and Reward Functions
    Hi Everyone, I'm venturing into a new territory of Reinforcement Learning (RL) through a personal project, despite having a solid background in various other ML domains. I'm developing an RL agent to play Skyjo, a turn-based card game, and I'm encountering some challenges related to reward optimization and game-ending decisions by the agents. I'd appreciate any advice or insights you might have! Project Overview: Objective: Develop an RL model to play Skyjo competitively. Environment: Built using Gymnasium and Pytorch. Agents: Two agents working in tandem - one for card selection (discard/draw) and the other for action and location selection. Training: 4-8 agent instances play against each other. Repository: https://github.com/grantslewis/auto_skyjo Reward Structure: Small p…  ( 10 min )
    My frustration level with Torch/Keras/Tensorflow and DQNs is killing me
    RANT: I've tried every possible example I can get my hands on. I've looked at reference examples. I've looked at Medium articles. I've looked at stuff written by college freshmen. Every example I find for a DQN written either for torch or tensorflow (and either tf_agents or keras), seems to either have a nasty bug preventing it to work or such a severe memory leak that it is unusable. I tried Torch recently and was doing some simple gridworlds. It does fine for tiny gridworlds like 5x5. I decided to push it a little (not much at all) to a known 21x21 gridworld from recognized papers - reference example died and ran out of memory after 3000 episodes - I mean - really? 3000 episodes? I ran on CPU and gave it 64GB. I don't know how much memory this SHOULD take. I can do it in a Q-Table for…  ( 10 min )
    Advice to improve outcome on a turn-based strategy game
    Hello everyone, I'm a total beginner in the reinforcement learning (RL) community, and I would appreciate some advice on a problem I'm currently facing. I've created a simple 2D turn-based game with only movement at the moment (I will also add combat features when I have success with training an AI for the movements). Game The rules are simple : A grid of 14x40 (560 cells in total) 1 Agent with a limited number of Move Point (MP) 1 Target that does not move (atm) The agent can end its turn to get its MP back I already implemented a pathfinding algorithm using A* which works really well but I would like to train an AI to reach the target as fast as possible (turn-wise). Here is a simulation of a state : ​ https://preview.redd.it/0p5yijnb60sb1.png?width=442&format=png&auto=…  ( 10 min )
    Cleanba, our new distributed DRL platform is finally out 🤗
    submitted by /u/vwxyzjn [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 3 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 3 October 2023 appeared first on Data Science Central.  ( 20 min )
    Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think?
    One of the most impressive generative AI applications I have seen is viperGPT. The image / site explains it best. The steps are: This example, earlier this year, showed the potential of multimodal LLMs And as of last week, that future is upon us ChatGPT can now see, hear & speak. What are the implications… Read More »Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? The post Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? appeared first on Data Science Central.  ( 20 min )
    Cracking the code: The rising demand for data scientists in various industries
    In the ever-evolving landscape of the digital era, the relentless quest for deriving actionable insights from a sea of information has become the cornerstone of innovation and strategy. As businesses and organizations strive to navigate the complex corridors of big data, the spotlight invariably falls upon the expertise of data scientists, the modern-day architects of… Read More »Cracking the code: The rising demand for data scientists in various industries The post Cracking the code: The rising demand for data scientists in various industries appeared first on Data Science Central.  ( 21 min )
    Generative AI megatrends: How many LLMs would you subscribe to?
    I recently subscribed to openAI GPT4 for the OpenAI Code Interpreter/Advanced data analytics. We are using it in our class at the University of Oxford.  Its really cool and we are also waiting the multimodal openAI features Recently, a well known AI critic said that he does not see how Generative AI companies could be… Read More »Generative AI megatrends: How many LLMs would you subscribe to? The post Generative AI megatrends: How many LLMs would you subscribe to? appeared first on Data Science Central.  ( 19 min )
    A few highlights of the Efficient Generative AI Summit (EGAIS)
    Large language models (LLMs) for generating text and vision models for generating images are notoriously inefficient. The larger they get, the more power hungry they become.   Kisaco Research in September hosted a one-day event in Santa Clara dedicated to the topic of generative artificial intelligence (GAI) efficiency, followed by a three-day Summit on Hardware and… Read More »A few highlights of the Efficient Generative AI Summit (EGAIS) The post A few highlights of the Efficient Generative AI Summit (EGAIS) appeared first on Data Science Central.  ( 21 min )
  • Open

    AI copilot enhances human precision for safer aviation
    Designed to ensure safer skies, “Air-Guardian” blends human intuition with machine precision, creating a more symbiotic relationship between pilot and aircraft.  ( 8 min )
  • Open

    Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI
    A diverse research ecosystem is essential to realizing the promise of AI. Accelerate Foundation Models Research aims to expand access to powerful models, engaging academics outside of computer science to pursue a broad range of important opportunities. The post Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI appeared first on Microsoft Research.  ( 10 min )
  • Open

    Meet the Maker: Robotics Student Rolls Out Autonomous Wheelchair With NVIDIA Jetson
    With the help of AI, robots, tractors and baby strollers — even skate parks — are becoming autonomous. One developer, Kabilan KB, is bringing autonomous-navigation capabilities to wheelchairs, which could help improve mobility for people with disabilities. The undergraduate from the Karunya Institute of Technology and Sciences in Coimbatore, India, is powering his autonomous wheelchair Read article >  ( 6 min )
    CG Geek Makes VFX Look Easy This Week ‘In the NVIDIA Studio’
    Releasing a 3D tutorial dubbed The Easiest VFX Tutorial Ever takes supreme confidence and the skills to back it up. Steve Lund a.k.a. CG Geek — the featured artist of this week’s In the NVIDIA Studio installment — has both in spades.  ( 8 min )
  • Open

    From graph theory to category theory
    Let G be a directed graph whose nodes are the positive integers and whose edges represent relations between two integers. In our first example we’ll draw an edge from x to y if x is a multiple of y. In our second example we’ll draw an edge from x to y if x ≥ y. […] From graph theory to category theory first appeared on John D. Cook.  ( 6 min )
    Test functions
    Test functions are how you can make sense of functions that aren’t really functions. The canonical example is the Dirac delta “function” that is infinite at the origin, zero everywhere else, and integrates to 1. That description is contradictory: a function that is 0 almost everywhere integrates to 0, even if you work in extended […] Test functions first appeared on John D. Cook.  ( 6 min )
    Groups vs Abelian groups: Pedantic or profound?
    This article will probably only be of interest to a small number of readers. Those unfamiliar with category theory may find it bewildering, and those well versed in category theory may find it trivial. My hope is that someone in between, someone just starting to get a handle on category theory, will find it helpful. […] Groups vs Abelian groups: Pedantic or profound? first appeared on John D. Cook.  ( 7 min )
  • Open

    DALL·E 3 system card
    No content preview  ( 1 min )

  • Open

    [Discussion] I didn't do well in Calculus III
    So I got an A in calculus three but I probably didn't deserve it since it was online and all I did was look up the answer and understand the problems given on the test. So I probably have a C level understanding. Will I be tested on calc 3 knowledge in machine learning or should I retake calc 3? submitted by /u/Glittering-Target-87 [link] [comments]  ( 9 min )
    [P] Hand keypoint detection
    Hello Reddit, I have a question regarding the right tool. I'm looking for a tool / model to detect hand-keypoints in a video stream of a person assembling stuff. I know OpenPose is a possible one, also Google MediaPipe. I’m not really getting along with OpenPose and MediaPipe don’t show really good results. In my project, I would like to detect hand keypoints in assembly scenarios. It would be ok to use 2 cameras or a depth camera if necessary. Does anybody knows any models / tools to use? Thanks in advance :) submitted by /u/VGHMD [link] [comments]  ( 9 min )
    [P] Best option for a large, local embedding database?
    Langchain offers a wide array of vector databases for text embedding models. I need to create a vector database for around 3 million sentence embeddings, each being of dimension 384. I'm building a prototype, so it has to be local and free of charge to use. So far, I've hit limits for Chroma (41,666 max). I've also tried Redis, QDrant and FAISS - each of these gets so large that it eats up all the RAM and the process gets killed, or with QDrant, just errors out. I've used Pinecone before, but I don't really want to pay for a prototype as I have plenty of disk space. I was thinking of chunking the 3 million documents into local vector stores of size 41,666 using ChromaDB - but there isn't a whole lot out there about whether Chroma would allow me to merge all ~70 of these smaller databases into a bigger one for search. I also cannot find whether it would be possible to load all 70 of these into memory and search each one individually. So what are my options? My other thought was just creating a large Doc2Vec model, however I would like to use something more sophisticated like Huggingface embedding models. submitted by /u/russ_fegoli [link] [comments]  ( 9 min )
    [D] Proof of convergence for a heavy-ball adaptive step-size algorithm for non-convex functions
    Hello everyone, I am struggling with prooving convergence for an optimizer which uses adaptive step-size with heavy ball algorithm for convex and non-convex functions. In some literature, I could find a regret bound analysis/proof for convex functions and proving that the estimated gradient at t -> inf goes to zero for non-convex functions. There are some assumptions and preconditions: The algorithm is heavy ball momentum with adaptive step-size. ' X_(k+1) = X_k - \eta_k . \nabla(f(x_k)) + \beta(x_k - x_(k-1)) The following assumptions are made: A. The function is smooth. B. The function is Lipschitz. C. The gradients are Lipschitz. I attempt to prove the convergence to a critical point or a local minima. Where the estimate of the gradients at any instance k goes to zero. i.e. E[\nabla(f(x_k))] = 0 as t -> inf. Could anyone please guide me through the process of convergence proof for non-convex functions or give me literature recommendations for the same. Thank you very much in advance. submitted by /u/Loose_Foundation5990 [link] [comments]  ( 9 min )
    [D] open problems after GPT4 capabilities
    We all know that LLMs (and especially foundation models) are extremely functionally capable. Has anyone made a nice list of deficiencies that they show? I know Gary Marcus did so many years ago, but after GPT3 and GPT4 -- what is still unsolved? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    [D] Hoeffdings inequality, does it make sense practically?
    According to it, increasing the hypotheses set loosens the upper bound between in-sample and out-of-sample error. ​ Can't we subdivide the hypotheses set to multiple ones, ensuring tighter bounds in general? ​ and generally, have you seen it in use before? I have seen a lot of ML projects without anybody mentioning it or anything theoretical. submitted by /u/2azo [link] [comments]  ( 9 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some models which support ensembling?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    Benefits of converting DICOM images to PNG's [P]
    I try to understand what are the benefits to convert DICOM images to PNG's. Context: I have DICOM images which I already extracted the useful meta-data I want to use. Those images are for a task, classification-detection pipeline of some disease. So as I already asked, what are the benefits of converting those DICOM files to PNG's rather then just using pydicom and the dicom pixel_array? Reason I ask this is because I saw many top 5 users on kaggle do this when dealing with DICOM images. If I understand how networks actually works, they get as input an array of pixels as floating point numbers no? So what's the differences between DICOM pixel_array to PNG's pixel array and numpy array or tensor? both are eventually will be fed to the network as a tensor of floating numbers. Is the reason is because PNG's are usually faster to train? Is the reason is because PNG's have more libraries support for preprocessing / augmentation / etc. ? Is the reason is because PNG's are the format many pre-trained models expect to? (I write this knowing it's 99% not true, as mentioned the tensor thing) Thanks in Advance, and Please, forgive my English (I could use AI tools to fix it but I feel addicted already) submitted by /u/01jasper [link] [comments]  ( 9 min )
    [D] What kind of distribution is this?
    Hey guys, I am wondering what kind of distribution my data are following? I want to fit a distribution function to them and use this fitted distribution function to generate new samples with a given mean and standard deviation (python). Any tips for this? Happy to hear your suggestions :) https://preview.redd.it/kdcftvpq8urb1.png?width=408&format=png&auto=webp&s=6163b9f571069e098c9e9a609c3d1cb9910fe1fb submitted by /u/Tigmib [link] [comments]  ( 9 min )
    [R] Efficient Streaming Language Models with Attention Sinks - Meta AI 2023 - StreamingLLM enables Llama-2, Falcon and Pythia to have an infinite context length without any fine-tuning! Allows streaming use of LLMs!
    Paper: https://arxiv.org/abs/2309.17453 Github: https://github.com/mit-han-lab/streaming-llm Abstract: Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails when the text length surpasses the cache size. We observe an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of wind…  ( 9 min )
    [Project] I just released an open-source package, TorchLens, that can extract the activations/metadata from any PyTorch model, and visualize its structure, in just one line of code. I hope it helps you out!
    You just give it any PyTorch model (as-is, no changes needed), and it spits out a data structure with the activations of any layer you want, along with a bunch of metadata about the model and each layer and an optional automatic visualization of the model's computational graph. I hope this greatly speeds up the process of extracting features from models for further analysis, and also serves as an aid in quickly understanding new models. I also hope it'd be helpful for teaching purposes, too. It is meant to work for any PyTorch model whatsoever and I've tested it on hundreds of models (see the "model menagerie" of visualizations below), though it's always possible I've missed some edge case or another. Hope it helps you out--I'm still actively developing it, so let me know if there's anything on your wishlist! https://preview.redd.it/k37nhejvxtrb1.png?width=640&format=png&auto=webp&s=5713a8711110644794e2264d84dd479ede861c5e GitHub Repo Twitter Thread Paper CoLab Tutorial Gallery of Model Visuals submitted by /u/therealjmt91 [link] [comments]  ( 9 min )
    [D] Why Vision Tranformers?
    Transformers have been the new kid on the block, easy to see why with LLMs and and sequential output generation, but I still don't know why vision transformers based on ViT are so hot in the field right now. From my understanding, CNNs are just vastly better than transformers for vision tasks, as its inductive biases allows it to determine the relationship between neighboring features of an image via pooling and filters. However, transformers don't have this kind of inductive bias, and as a result, take much more data and compute to reach similar levels of performance. I read this survey paper on Vision Transformers here: https://arxiv.org/pdf/2012.12556.pdf, which has the performance of CNNs vs various transformer models for CV. Comparing even the best vision transformers to the classic …  ( 10 min )
    [R] Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Awesome AI developer productivity Github repo
    Hello everyone, We've begun gathering a variety of AI coding tools used in one place to make things easier for everyone. We're inviting everyone to check out our collection, and maybe even add tools you find useful. You can find the repository here: https://github.com/gaborsoter/awesome-ai-dev-productivity Feel free to explore and contribute! submitted by /u/BootstrapGuy [link] [comments]  ( 9 min )
    [R] On the Biometric Capacity of Generative Face Models
    We developed a statistical model to estimate “How many unique identities can a generative face model generate?” without exhaustively generating a lot of faces. Abstract: There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: “Given a generative face model, how many unique identities can it generate?” In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and comparing different generative face models and establish an upper bound on their scalability. This paper proposes a statistical approach to estimate the biometric capacity of generated face images in a hyperspherical feature space. We employ our approach on multiple generative models, including unconditional generators like StyleGAN, Latent Diffusion Model, and “Generated Photos,” as well as DCFace, a class-conditional generator. We also estimate capacity w.r.t. demographic attributes such as gender and age. Our capacity estimates indicate that (a) under ArcFace representation at a false acceptance rate (FAR) of 0.1%, StyleGAN3 and DCFace have a capacity upper bound of 1.43 million and 11,900, respectively; (b) the capacity reduces drastically as we lower the desired FAR with an estimate of 17,960 and 562 at FAR of 1% and 10%, respectively, for StyleGAN3; (c) there is no discernible disparity in the capacity w.r.t gender; and (d) for some generative models, there is an appreciable disparity in the capacity w.r.t age. Paper: https://arxiv.org/abs/arXiv:2308.02065 Code: https://github.com/human-analysis/capacity-generative-face-models submitted by /u/VishDev [link] [comments]  ( 9 min )
    [P] Comgra: A library for debugging and understanding neural networks
    I'm a machine learning engineer and researcher. I got fed up with how difficult it is to understand why neural networks behave the way they do, so i wrote a library to help with it. Comgra (computation graph analysis) is a library you can use with pytorch to extract all the tensor data you care about and visualize it graphically in a browser. This allows for a much more detailed analysis of what is happening than the usual approach of using tensorboard. You can go investigate tensors as training proceeds, drill down into individual neurons, inspect single data sets that are of special interest to you, track gradients, compare statistics between different training runs, and more. This tool has saved me a ton of time in my research by letting me check my hypotheses much more quickly than normal and by helping me understand how the different parts of my network really interact. I first published this a month ago and have made some improvements since then. I would be happy to hear even more feedback! My goal is to make this the go-to library used both by novices who want to understand what's going on under the hood, and by researchers in neural architecture design. submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] The most complete Audio ML toolkit 🚀
    Hugging Face Transformers is a complete audio toolkit that provides state-of-the-art models for all audio tasks, including TTS, ASR, audio embeddings, audio classification and music generation. All you need to do is install the Transformers package: pip install --upgrade transformers And then all of these models can be used in just 3 lines of code: ​ TTS Example usage: from transformers import pipeline generator = pipeline("text-to-speech", model="suno/bark-small") speech = generator("Hey - it's Hugging Face on the phone!") Available models: Bark https://huggingface.co/suno/bark MMS TTS https://huggingface.co/facebook/mms-tts-eng VITS https://huggingface.co/kakao-enterprise/vits-vctk SpeechT5 https://huggingface.co/microsoft/speecht5_tts And more! https://huggingface.co/mo…  ( 9 min )
    [R] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) - Microsoft 2023 - 166 Pages!
    Paper: https://arxiv.org/abs/2309.17421 Youtube: https://youtu.be/Q0pP782dSh0?si=MiJAlK5k-KEyQ-Zr Abstract: Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory skills, such as visual understanding, to achieve stronger generic intelligence. In this paper, we analyze the latest model, GPT-4V(ision), to deepen the understanding of LMMs. The analysis focuses on the intriguing tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model. In our approach to exploring GPT-4V, we curate and organize a collection of carefully designed qualitative samples spanning a variety of domains and tasks. Observations from these samples demonstrate that GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs and the genericity of its capabilities together make GPT-4V a powerful multimodal generalist system. Furthermore, GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods such as visual referring prompting. We conclude the report with in-depth discussions on the emerging application scenarios and the future research directions for GPT-4V-based systems. We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models. https://preview.redd.it/qkytzg2rjqrb1.jpg?width=511&format=pjpg&auto=webp&s=fc306dc6ae64100e993639f8e27583b809bf8a5c https://preview.redd.it/z4kq0l2rjqrb1.jpg?width=507&format=pjpg&auto=webp&s=d4fda59456846fa7a6c9b318b21fc9c544bd2b68 https://preview.redd.it/1ptrkk2rjqrb1.jpg?width=712&format=pjpg&auto=webp&s=2b44fbc949e76fdf20d05b1236f56c87ba5efece ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] NanoPhi, Implementing some of the success of Phi-1.5, with GPT-2(124m)
    Hi, i'm trying to replicate at least some of the success of Phi 1.5 on a model 10x smaller, gpt-2 124m. I have started with model finetuning, and have a simple github with roadmap, https://github.com/VatsaDev/NanoPhi, check it out there! submitted by /u/vatsadev [link] [comments]  ( 9 min )
  • Open

    Code Llama code generation models from Meta are now available via Amazon SageMaker JumpStart
    Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code […]  ( 11 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 1
    A successful deployment of a machine learning (ML) model in a production environment heavily relies on an end-to-end ML pipeline. Although developing such a pipeline can be challenging, it becomes even more complex when dealing with an edge ML use case. Machine learning at the edge is a concept that brings the capability of running […]  ( 10 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 2
    In Part 1 of this series, we drafted an architecture for an end-to-end MLOps pipeline for a visual quality inspection use case at the edge. It is architected to automate the entire machine learning (ML) process, from data labeling to model training and deployment at the edge. The focus on managed and serverless services reduces […]  ( 9 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 3
    This is Part 3 of our series where we design and implement an MLOps pipeline for visual quality inspection at the edge. In this post, we focus on how to automate the edge deployment part of the end-to-end MLOps pipeline. We show you how to use AWS IoT Greengrass to manage model inference at the […]  ( 9 min )
  • Open

    [D] RL agenda after LLMs or S4?
    Many other students in my research institution are pretty worried after ChatGPT / LLMs about continuing work in RL and are thinking of leaving the field. What are main the open problems in RL after LLMs and S4 can solve a hefty chunk of sequence learning problems? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    RLHF without GAE
    If I already have a trained reward model, say a sentiment classification model, that I'd like to use for PPO-based RLHF, I believe the standard method would be to instantiate the Critic/value function using the reward model, and train it further during PPO, correct? Would it even make sense to try PPO for RLHF without using the GAE term and thus without the value function, and just directly using the reward model's output as the advantage? It seems that this would be require viewing the entire generation as a single action (rather than each token's generation as an action), but most of the articles I've read on RLHF seem to treat it that way. On the other hand, all the code implementations I've seen have an Actor-Critic model producing values at each token, which I think implies that each token is an action. Edit: Apologies if any of this is just me having fundamental gaps in my understanding! submitted by /u/ganzzahl [link] [comments]  ( 9 min )
    3-player graph pursuit game
    So I am trying to find NE using rl algorithms for a turn based deterministic graph pursuit game. I have a way of checking if the strategies of players 1,2,3 are a NE and thought of using Q-Learning and see if it converges to a NE. Thus far it doesnt seem to work and I wonder if I made a mistake. The state is described as: St = [x1 x2 x3 p] where current player is p and x1,x2,x3 are the locations of the players in the graph Players have value functions Q^1(St), Q^2(St), Q^3(St) The way I update my value function is: player i choose e-greedy action a_t and the new state St_new Q^i(St) = (1-alpha)*Q^i(St)+alpha*gamma*Q(St_new) I have tried using a memory buffer but I havent improve the convergence success. I check if the if the values are a NE every 1000 iterations. It only converges for simple graphs. Do you think the way I update my value function is correct? Do you have any other traditional algorithms to suggest? Shall I move to deep learning? I am worried if simple algorithms cant converge the neural networks wont either... I tried to implemenet Nash Q learning following the paper:https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf but I am not sure if implemented correctly for a turn based game submitted by /u/__gp_ [link] [comments]  ( 9 min )
  • Open

    Save 20 Hours A Week With This 1 Simple ChatGPT Prompt for Productivity
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    AI Anxiety’ Is on the Rise–Here’s How to Manage It
    Artificial intelligence (AI) anxiety is on the rise, but there are ways to manage it. While AI may outperform humans in certain tasks, humans are not yet headed for all-out replacement. Recent research shows that AI programs scored higher than humans in tasks requiring originality, but the highest-rated human ideas were still considered more creative. The rise of generative AI tools in industries like animation has left some professionals anxious about the future of their work. Experts suggest managing AI fears by understanding the historical context of technological advancements and focusing on the benefits and training opportunities that AI brings. Source : https://www.scientificamerican.com/article/ai-anxiety-is-on-the-rise-heres-how-to-manage-it/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Bard, what would it take to fast-track AI morality so that it surpasses human ability by 2025?
    To fast-track AI morality to surpass human ability by 2025, Google, OpenAI, Anthropic, and other developers could: Invest heavily in AI ethics research and development.** For example, Google AI has launched the Responsible AI Research and Development (RAIR) program, which is focused on developing new AI ethics research and tools. Create large-scale datasets of human moral reasoning.** For example, OpenAI has released the Moral Machine dataset, which contains over 1 million crowdsourced responses to moral dilemmas. Develop new AI architectures that are specifically designed for moral reasoning.** For example, Anthropic is developing a new AI architecture called "Moral Machine Learning," which is designed to learn from human moral reasoning data. Establish a global AI ethics research consortium and create a public database of human moral reasoning datasets.** For example, the Partnership on AI has launched the Safeguarding AI initiative, which is working to develop new safety mechanisms for AI systems. Fund research into developing new AI architectures for moral reasoning and develop new AI evaluation metrics for moral performance.** For example, the Moral Machine project is developing new evaluation metrics for AI systems' moral performance. By working together, Google, OpenAI, Anthropic, and other developers can help to fast-track AI morality and create AI systems that are more moral than humans. (199 words) submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    AI & Us Navigating the Digital Renaissance
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Prompt enginnering questions
    Is propt engineering a legit job ?? Is it here to stay ? Is it worth studying ? Best way to study it , land a job or freelancing ? submitted by /u/metasubcon [link] [comments]  ( 8 min )
    What app/program are they using on this Instagram?
    How does one make videos like on this Instagram page? It's unreal. https://instagram.com/nostalgicraindrops?igshid=MzRlODBiNWFlZA== submitted by /u/CK1886 [link] [comments]  ( 8 min )
    ChatGPT Can Now See? Mind-Blowing Ways People Can Use Image Recognition!
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Let’s make a list of the BEST AI TOOLS for students
    Every day, new AI tools appear. There are also AI tools designed to make students' lives easier—from AI essay generators to study organizers. While there are many directories with AI tools, they are often not well-sorted for students. So, I've compiled a list of my favorite AI tools for educational purposes. AI tool How to use for studies Bing Chat - Writing excel formulas - Making graphs and charts - Answers for homework assignments - Researching for a paper Textero.ai - Search for relevant academic sources for essays - Research assistance with the "Ask AI" feature - Essay generation and paper formatting - Structured essay outline creation - Summarizing of texts ChatPDF - Interacting with academic PDFs - Asking specific questions about the content - Quickly locating essential data for assignments Socratic - Breaking down complex homework questions - Providing step-by-step educational guidance - Safe and interactive learning Writely AI - Improving grammar and writing clarity - Creating concise study notes - Feedback for content quality Turnitin - Checking for copied content - Comparing against a vast academic database - Highlighting potential plagiarism Got any to add to the list? Let's share and help each other! submitted by /u/loyallyUrticate [link] [comments]  ( 9 min )
    Tested Dalle, created a monster.
    submitted by /u/Grindmaster_Flash [link] [comments]  ( 8 min )
    Meta's Llama 2 Long outperforms GPT 3.5 and Claude 2
    Meta Platforms recently introduced Llama 2 Long, a revolutionary AI model that outperforms top competitors with its ability to generate accurate responses to long user queries. For the latest advancements in AI, look here first. https://preview.redd.it/geqqd3k5rprb1.png?width=1920&format=png&auto=webp&s=e72a67fc7ef7e85902169f3061529c136beadc87 Meta's new AI model As an enhancement of the original Llama 2, Llama 2 Long deals with larger data containing longer texts and is modified to handle lengthier information sequences. Its stellar performance outshines other models such as OpenAI's GPT-3.5 Turbo and Claude 2. How Llama 2 Long works Meta built different versions of Llama 2, ranging from 7 billion to 70 billion parameters, which refines its learning from data. Llama 2 Long employs Rotary Positional Embedding (RoPE) technique, refining the way it encodes the position of each token, allowing fewer data and memory to produce precise responses. The model further fine-tunes its performance using reinforcement learning from human feedback (RLHF), and synthetic data generated by Llama 2 chat itself. Impressive feats and future aspirations Llama 2 Long can create high-quality responses to user prompts up to 200,000 characters long, which is approximately 40 pages of text. Its ability to generate responses to queries on diverse topics such as history, science, literature, and sports indicates its potential to cater to complex and various user needs. The researchers see Llama 2 Long as a step towards broader, more adaptable AI models, and advocate for more research and dialogue to harness these models responsibly and beneficially. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI Image Generator That Is Good At Referencing Pop Culture
    I've recently tried Canva and Dall-E to generate an image that references two popular games, Dark Souls 3 and Baldur's Gate 3. And they both fall on their face. Maybe my prompt is bad but Canva is not getting me what I want. Dall-E ran out of free credits. Do you guys have any recommendations. Midjourney is no longer free now. I would like this to be free and has good references to popular culture. submitted by /u/livingroomsessions [link] [comments]  ( 9 min )
  • Open

    Awarded DAGM MVTec Dissertation Award 2023
    In September, I received the DAGM MVTec dissertation award 2023 for my PhD thesis. DAGM is the German association for pattern recognition and organizes the German Conference on Pattern Recognition (GCPR) which is Germany's prime conference for computer vision and related research areas. I feel particularly honored by this award since my academic career started with my first paper published as part of the young researcher forum at GCPR 2015 in Aachen. The post Awarded DAGM MVTec Dissertation Award 2023 appeared first on David Stutz.  ( 3 min )
  • Open

    Supereggs, squigonometry, and squircles
    The Depths of Wikipedia twitter account posted a screenshot about supereggs that’s popular at the moment. It says there’s no way this is real. they must be making these words up above a screenshot from the Wikipedia article on supereggs saying The definition can be changed to have an equality rather than an inequality; this […] Supereggs, squigonometry, and squircles first appeared on John D. Cook.  ( 5 min )
    Corny AI
    Meredith Whittaker posted on Twitter that In addition to being the best in privacy, Signal is also the best in not subjecting you to corny ‘AI’ features no one asked for or wants. I love the phrase “corny AI.” That’s exactly what a lot of AI features are. “Would you like help composing that tweet?” […] Corny AI first appeared on John D. Cook.  ( 5 min )
    Today’s star
    The star-like image above is today’s exponential sum. The exponential sum page on my site generates a new image each day by putting the numbers of the day’s month, day, and year into the equation and connecting the partial sums in the complex plane. Here m is the month, d is the day, and y […] Today’s star first appeared on John D. Cook.  ( 5 min )
  • Open

    A more effective experimental design for engineering a cell into a new state
    By focusing on causal relationships in genome regulation, a new AI method could help scientists identify new immunotherapy techniques or regenerative therapies.  ( 10 min )
    Is AI in the eye of the beholder?
    Study shows users can be primed to believe certain things about an AI chatbot’s motives, which influences their interactions with the chatbot.  ( 10 min )
  • Open

    Fishing with Neural Nets | Transforming Ecology with Artificial Intelligence
    submitted by /u/plutoandmal [link] [comments]  ( 8 min )
  • Open

    Google at ICCV 2023
    Posted by Shaina Mehta, Program Manager, Google Google is proud to be a Platinum Sponsor of the International Conference on Computer Vision (ICCV 2023), a premier annual conference, which is being held this week in Paris, France. As a leader in computer vision research, Google has a strong presence at this year’s conference with 60 accepted papers and active involvement in 27 workshops and tutorials. Google is also proud to be a Platinum Sponsor for the LatinX in CV workshop. We look forward to sharing some of our extensive computer vision research and expanding our partnership with the broader research community. Attending ICCV 2023? We hope you’ll visit the Google booth to chat with researchers who are actively pursuing the latest innovations in computer vision, and check out som…  ( 94 min )

  • Open

    CGPT-4, how would an AI select the 100 most intelligent people on the planet based on their content rather than on their positions and awards?
    If an AI were to take on this task, it would primarily rely on data mining and natural language processing (NLP) techniques. First, the AI would scrape digital platforms to collect works authored by various people. Platforms could range from academic databases like PubMed or ArXiv to blogs, forums, and social media. The goal would be to capture a wide array of intellectual output, irrespective of the person's titles or accolades. Next, an NLP model would evaluate the collected content based on factors like originality, complexity, and coherence. Advanced sentiment analysis could be employed to gauge the depth of understanding and nuances in the arguments made. Specialized algorithms could also be developed to assess the impact of each piece of work, using metrics such as citations, social shares, or subsequent works that build upon it. The AI would then create a shortlist based on these evaluations. This stage might also involve unsupervised machine learning techniques like clustering to find patterns or commonalities among the top contenders. The final stage would be validation, possibly using reinforcement learning. The AI could simulate various scenarios or problems and predict how the content created by these individuals would contribute to solving them. It would then refine its list based on the simulated outcomes. This all-AI approach would drastically reduce human bias and could be executed relatively quickly. However, it's important to note that any such system would need to be designed carefully to avoid introducing biases present in the training data or algorithms. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    So it's unethical to kill an AI robot
    submitted by /u/bharath_brt [link] [comments]  ( 9 min )
    How Big Tech is co-opting the rising stars of artificial intelligence
    Big Tech's dominance in the artificial intelligence (AI) industry is growing as start-ups like Anthropic rely on their computing power and resources. Despite creating breakthrough AI technology, these start-ups still need the support of Big Tech to scale and succeed. The training of AI systems is expensive and requires specialized computer chips and data centers, which are mostly controlled by Amazon, Google, and Microsoft. Regulators, including the Federal Trade Commission and French competition authorities, are monitoring the industry for signs of anticompetitive behavior. Some business leaders believe that competition and efficiency will eventually drive down the cost of running AI models. Source : https://www.washingtonpost.com/technology/2023/09/30/anthropic-amazon-artificial-intelligence/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Data strategy >> Generative AI strategy
    A strong data strategy is crucial for the success of any AI strategy. Generative AI use cases depend on a healthy data infrastructure, including data governance, observability, catalog, data sharing, and lineage. Many enterprises lack the necessary data infrastructure to deploy customer-facing AI apps confidently. Poor data strategy and infrastructure can derail generative AI efforts. Existing issues with data ecosystems, such as data silos and poor data governance, will have a greater impact on generative AI workloads than new issues. Data silos, poor data discoverability, and the lack of data interoperability can become serious bottlenecks for generative AI apps. Source : https://nextword.substack.com/p/data-strategy-matters-for-generative submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Does anyone know a good AI tool to generate tattoo ideas and song cover art?
    Same as title submitted by /u/No-Educator-59 [link] [comments]  ( 9 min )
    Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    There's So Many AI Chatbots, But Which One Is The Best? (Complete Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/1/2023
    Microsoft Researchers Introduce AutoGen: An Artificial Intelligence Framework for Simplifying the Orchestration, Optimization, and Automation of LLM Workflows.[1] StoriaBoard helps filmmakers, marketers and other storytellers pre-visualize stories. Simply upload your script, select a visual style, and generate hundreds of frames in seconds.[2] Will Hurd Releases A.I. Plan, a First in the Republican Presidential Field.[3] Sam Altman says AI systems will automate some tasks but also lead to ‘new and much better jobs’.[4] Sources: [1] https://www.marktechpost.com/2023/09/30/microsoft-researchers-introduce-autogen-an-artificial-intelligence-framework-for-simplifying-the-orchestration-optimization-and-automation-of-llm-workflows/?amp [2] https://www.producthunt.com/posts/storiaboard [3] https://www.nytimes.com/2023/09/20/us/politics/will-hurd-ai-plan.html [4] https://www.businessinsider.com/openai-sam-altman-ai-will-automate-tasks-create-better-jobs-2023-9?amp submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    This is no time for ease and comfort. It is time to dare and endure. -Winston Churchill
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 9 min )
    Quizlet AI reliability?
    What is everyone’s thoughts on the reliablity of the quizlet AI? I just talked to a friend and she said that she uses the AI to study with quizlet. submitted by /u/immickle [link] [comments]  ( 9 min )
  • Open

    [R] The unsolved mystery at the heard of the "How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" paper
    submitted by /u/CellWithoutCulture [link] [comments]  ( 9 min )
    [D] How many instructions can LLMs handle before they start to ignore them?
    Prompt engineering frequently involves trying to encode very specific behaviors into a model to steer it a certain direction. In practice, as requirements become more complex, you often end up with fairly lengthy prompts, especially when using methods like RAG. I was wondering, how effective are LLMs at following instructions as the system prompt grows in size and complexity? I did some quick experiments on this and found that, unsurprisingly, GPT-4 can follow a lot of rules (up to 50) quite accurately. But even GPT-3.5 slowly degrades and Llama-2-70b-chat starts to fail after just a few rules. Comparison of performance metrics over increasing rule counts, demonstrating GPT-4's consistent performance and a decline in accuracy for GPT-3.5 and Llama-2-70b-chat. These results are based on …  ( 10 min )
    [R] LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/rb0xg1ly8nrb1.png?width=1021&format=png&auto=webp&s=8e57056d24327ca2987abea12a7a9066a825738b submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [P] Simplest model to run with limited hardware
    We want to run (not train, i.e. think single forward pass only) an ML algorithm on a machine with very limited resources. Which model could we use to show off the possibilities? If the benchmark is an MLP for binary image classification, what else could we do with a similar scale of operations? E.g. Which model is the simplest for e.g. text-to-image generation? Any other ML models that are simple enough to run and if initialized with good params, does something impressive submitted by /u/2i2i_tokenized_time [link] [comments]  ( 9 min )
    [P] Deep Memory, a Way to Boost Retrieval Accuracy by up to +22% for RAG
    submitted by /u/davidbun [link] [comments]  ( 9 min )
    [D] Perplexity.ai Search Feasibility
    I've been using Perplexity.ai for a bit now when it hit me that I don't understand how they can sustain their business model with search. Stuff like Bing search and Google search cost around $5 or more per 1000 searches, so how can they even afford to do this kind of search. Do they have their own search index. Also, I don't know how they pull in the data from these sources so fast? I've played around with some things like this with Langchain with retrieval, but the speed of splitting and tokenizing website html is not very fast. Have they already pre-scrapped the websites from the search results and tokenized them for LLM retrieval? submitted by /u/dragon18456 [link] [comments]  ( 9 min )
    Metagpt use case [D]
    Guys, i am currently working building a project, there are certain tasks like building a ml model using certain use-cases. I wish to automate this task, do u think metagpt is a good fit for the same. Let me know if you need any further information!! EDIT: One of the tasks my app needs to do is to convert image to text (aim to implement image captioning). So, if i give metaGPT the requirements for my project, is it possible it will give me the code which I need. I need to save certain tasks here so that I can focus more on operation and design side. Edit: it seems, such kind of vague questions are not encouraged on this platform, I will work and will straigh away ask questions which are quite good and meet the standards of this platform. Thanks!! Thanks!! Always have a massive respect for this community!! submitted by /u/aristotleTheFake [link] [comments]  ( 9 min )
    [R] Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Multiple single class segmentation vs single multiclass segmentation models
    submitted by /u/waterstrider123 [link] [comments]  ( 9 min )
    [R] SOTA of Deep-Shallow Encoder-Decoder LLMs for fast inference
    There's some evidence [1] [2] that it's possible to run text2text language model at substantially (potentially on the order of magnitude) better inference speed by keeping the decoder shallow. I'm curious whether some general reasoner SOTA (small model for machine translation available at [3]) style models are publicly available for this sort of thing. If not, how would one go about training one? Would it be necessary to do it entirely from scratch (extremely costly)? Or would it be possible to take, say, Flan-UL2 (20B), chop off its decoder, and train a much smaller decoder on top of it with the UL2 encoder frozen (ie how one trains adapter layers). Assuming the decoder hyperparameters are kept small, would this be possible within reasonable compute budget? Would that even meaningfully converge with small amount of compute (assuming same training objective as is for UL2)? Would the strength (ie somewhat comparable to 10B if we cut 20B in half) transfer from the SOTA encoder, or would cutting off half of the model layers kneecap it too badly? [1] https://arxiv.org/pdf/2006.10369.pdf [2] https://aclanthology.org/2023.sustainlp-1.6.pdf [3] https://github.com/snoop2head/Deep-Encoder-Shallow-Decoder submitted by /u/upalse [link] [comments]  ( 9 min )
    [D] Duplicating layers in large models
    Is there any notable work on duplicating layers in large feed forward models? In contrast to e.g. the brain which is essentially a directed graph most networks utilized nowerdays use a feed forward approach. E.g. transformers are able to attend to past tokens, but generate the tokens in a way where for a given token a given weight is not utilized at different stages in the feed forward pass. In my intuition this would lead to an issue where concepts (factual data as well as learned "algorithms") might be duplicated as they are needed at different depths in the generation process and are sequentially dependent on one another. This does not directly make the model less capable, as it might learn the same concept at two layers sufficiently well, but it reduces the data and parameter efficiency and and might impact generalization capabilities. Using a full on brain like graph might be hard to implement/optimize/scale on current hardware and is tricky with the backprop. But is there any work on duplicating a few layers, placing them at different depths in large models. I would guess that this would be more impactful for large models. One would essentially trade compute for better data efficiency. submitted by /u/floriv1999 [link] [comments]  ( 9 min )
    [n] Introducing r/AudioAI: Any AI You Can Hear!
    I couldn't find any AI sub dedicated to audio, so I’ve created r/AudioAI to serve as a hub for everything at the intersection of artificial intelligence and the world of sounds. AI-driven music, speech, audio production, and all other AI audio technologies. If anyone wants to be part of mod, let me know! submitted by /u/chibop1 [link] [comments]  ( 9 min )
  • Open

    LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. ​ Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/o0v8p9g7tmrb1.png?width=1021&format=png&auto=webp&s=ff1ac672b61f96e4669663410769127066a0674d submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    Equation for what neurons (of 1s that attach parietal region to conscious brain regions) should attach to microprocessor to offload math functions?
    " Bio education below *. Summarization: ~1000 IO neurons attach math regions to conscious regions, low cost 1000-electrod microprocessors can run on radio. * https://youtube.com/watch?v=bhp2CkNDxME Don't want for self; want for professors and humans who program KUKA's/FANUC's for construction, and for who do calculations/optimizations for CUDA, MS Visual Studio and such, but what go up for experimentation should funds allow." sounds fun submitted by /u/2002LuvAbbaLuvU [link] [comments]  ( 9 min )
  • Open

    Reinforcement Learning + Computer Vision listing papers
    Hello everyone! A while back, I stumbled upon an interesting paper that applied Reinforcement Learning to Object Localization. I got fascinated by how computer vision tasks could be transformed into a reinforcement learning problem, making it feel like a Markov decision process ! So, i've decided to create a repository to compile all the existing (published) papers that delve into Reinforcement Learning in Computer Vision : https://github.com/rayanramoul/RLCV-Papers If you have any papers in mind or recommendations to enhance the repository, please don't hesitate to share them. Your input would be greatly appreciated! Thank you! :) submitted by /u/raysamram [link] [comments]  ( 9 min )
    Multi-Agent DQN not learning for Clean Up Game - Reward slowly decreasing
    The environment of the Clean Up game is simple: in a 25*18 grid world, there's dirt spawning on the left side and apples spawning on the other. Agents get a +1 reward for eating an apple (by stepping onto it). Agents clean the dirt also by stepping on it (no reward). Agent can go up, down, left, right. The game goes on for 1000 steps. Apple's spawn probability depends on the amount of dirt (less dirt, higher the probability). Currently, the observation for each agent has the manhatten distance to their closest apple and dirt. I have tried multiple ways of training this, including changing the observation space of the agents. But it seems the result does not outperform random agents by any significant amount. The network is simple, it tries to take in all the observations for all the agen…  ( 10 min )
  • Open

    Entity Language Models: Monetizing Language Models – Part 2
    We must move beyond just taming…to monetizing Language Models! In part 1 of this series on Small Language Models (“Use Case Language Models: Taming the LLM Beast – Part 1”), I explored the business and operational value of Use Case-specific Small Language Models (Use Case Language Models). Use case language models are trained or adapted… Read More »Entity Language Models: Monetizing Language Models – Part 2 The post Entity Language Models: Monetizing Language Models – Part 2 appeared first on Data Science Central.  ( 23 min )
  • Open

    Botober 2023
    Since 2019 I've generated October drawing prompts using the year's most state-of-the-art text-generating models. Every year the challenges are different, but this was one of the hardest years yet. Large language models like chatgpt, GPT-4, Bing Chat, and Bard, are all tweaked to produce generic, predictable  ( 6 min )
    Bonus: There was no 2020 Botober?
    AI Weirdness: the strange side of machine learning  ( 2 min )

  • Open

    [P]Handling categorical missing data in churn prediction model for telecom data
    I am working on a telecom dataset where I need to fit a model to for predicting churn(yes or no). There are a lot of categorical data with missing values( total values 7043). What is the best way to handle missing data in this case, is it better to ignore it or any other better imputation method? Data columns (total 21 columns): customerID 7043 non-null object gender 7043 non-null object Age 7043 non-null int64 Partner 7043 non-null object Dependents 7043 non-null object tenure 7043 non-null int64 PhoneService 7043 non-null object MultipleLines 6500 non-null object InternetService 6500 non-null object OnlineSecurity 7043 non-null object OnlineBackup 7043 non-null object DeviceProtection 7043 non-null object TechSupport 7043 non-null object StreamingTV 6500 non-null object StreamingMovies 6500 non-null object Contract 6500 non-null object PaperlessBilling 7043 non-null object PaymentMethod 6500 non-null object MonthlyCharges 7043 non-null float64 TotalCharges 7043 non-null object Churn 7043 non-null object submitted by /u/guyloveskissing [link] [comments]  ( 9 min )
    [D] (How) Can you estimate inference speed of a NN model on given hardware?
    How, outside of testing, do you estimate how quickly a specific model will run on some hardware? Anything about time is rarely mentioned in papers and if it is, it's more likely to talk about training, unless authors are specifically proud of their speed (like YOLO). Even less so in any README. Some way to translate numbers of parameters into seconds on a given GPU/CPU, any rules of thumb better than just setting up everything every time? submitted by /u/teleoflexuous [link] [comments]  ( 9 min )
    [D] How do I begin with AI ?
    I'm fairly new to the Al domain. I've decent python knowledge. I've gone through a lot of YouTube tutorials and got stuck in the tutorial hell. After struggling through hours of videos came here as my only last hope !!. How do I begin? What python frameworks should I learn? Which particular books should I refer ? submitted by /u/Dry_Ad_3887 [link] [comments]  ( 9 min )
    [D] Struggling to get interviews what to do?
    Edit: I am a USA citizen so no need for sponsorship. I have 4 yoe in a start up company and a phd four publications 2 in high level math journals and 2 CV/DL papers in A journals and also 4 patents. I have experience with most common Cv tasks eg object detection, Multi object tracking, 2d/3d human pose estimation and monocular depth estimation. I’m well versed in typical network building blocks eg conv nets, FFNs, transformers, Diffusion etc. I have a little experience with NLP like NLTK and TTS networks. Also some other general dev technologies like ec2,s3,sql,mongoose, etc. That all being said I can’t seem to even get interviews these days just straight rejections not talking to recruiters. On the other hand in 2020, I was just searching for jobs passively and had something like a 75% success rate with getting interviews. I know the job market has changed but I’m a lot more experienced at this time than then and having abysmal luck. Anyone have any advice would be happy to share my resume if that would make it easier to give advice. Also open to hearing what other technologies o should/could learn. submitted by /u/AbjectDrink3276 [link] [comments]  ( 9 min )
    Arxiv [D]ives - Segment Anything
    Every Friday for the past few months we’ve been hosting a public paper club called “Arxiv Dives”. We pick a paper and dive deep into it and chat about it as a group. There are a lot of gems of knowledge hidden in these research papers, and the main motivation is simply to keep up with most impactful techniques in the field by taking the time to dive in and discuss. The attendees so far have been great, and would love for anyone is interested to join! https://lu.ma/oxenbookclub submitted by /u/FallMindless3563 [link] [comments]  ( 9 min )
    [D] What exactly are the compute requirements for training a dense model versus an MoE?
    Hi, New to ML, I can't find a clear answer to this question. I find references online to a 1.8 trillion parameter model taking up the computational power of a 10B model, yet I also hear that the memory requirements a lot higher for an MoE? If I was interested in training/inferencing, for example, a 15M dense model, or a 60M MoE with 4 15M experts. whats the difference gonna be? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [D] How close are we to Neuro-Symbolic architectures that are 100% accurate?
    I’m new to AI/ML and my understanding is that (1) LLMs are SOTA in many tasks, and their short comings, such as ~70% accuracy, hallucinations, inability to learn from small samples etc, are well known. (2) Neuro-symbolic approaches are apparently the way to get accuracy to 100% and solve other shortcomings. So question is (3) What are the promising research in LLMs+Symbolic architectures? (4) And how close is it to production, rather than academic? (5) Do we need non-LLM based architectures instead? submitted by /u/reeldeele [link] [comments]  ( 9 min )
    [D] How to Integrate fine tuned LLAMA 2 in website ?
    I'm absolute beginner in Machine Learning. Me and My team are building a Chat Bot that recommends medicine based on symptoms, for that we are fine tuning LLAMA 2. Uploading BOOKS to train and we will ask question based on that books. SomeHow I got code on github to FineTune LLAMA 2. But how can I Integrate in my website ? How to connect it in my web app. Need some guidance. We have submission in 2 weeks. If anyone is willing to mentor us in this project or just guide what to do. submitted by /u/BookAny3024 [link] [comments]  ( 9 min )
    [D] What algorithms to use text classification
    I have some data - twitter description of an event in text and the event itself. If I have 100000 tweets in column X and a category in Y - e.g sporting event review, movie review, news, etc what is the best algorithm to match them. Should I make the description a bag of words and depending on the word frequency I can train a ML model (random forest,svm,etc.) or can the algorithm take into account the order. submitted by /u/AnyJello605 [link] [comments]  ( 9 min )
    [D] Deploy the Mistral 7b Generative Model on an A10 GPU on AWS
    Hello, The Mistral 7b AI model beats LLaMA 2 7b on all benchmarks and LLaMA 2 13b in many benchmarks. It is actually even on par with the LLaMA 1 34b model. So I made a quick video about how to deploy this model on an A10 GPU on an AWS EC2 g5.4xlarge instance: https://nlpcloud.com/deploy-mistral-7b-on-a10-gpu-on-aws.html I hope it will be useful. If you have recommendations about how to improve this video please don't hesitate to let me know, that will be very much appreciated! Julien submitted by /u/juliensalinas [link] [comments]  ( 9 min )
    [D] CIDEr values in PaLI model and XM 3600 dataset
    I am reading PaLI: A Jointly-Scaled Multilingual Language-Image Model . In their table 2 (page 6), it's reported that Thapliyal et al. (2022) (0.8B) model got 57.6 of CIDEr on XM 3600 for English. Thapliyal et al. (2022) is Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset. However in this paper, the CIDEr values are reported less than 1. For example, the largest model got 0.584 of CIDEr on XM 3600 for English. Could someone explain to me why those values have great differences? submitted by /u/KingsmanVince [link] [comments]  ( 9 min )
    [R] Pathway to self-learning mathematics and statistics for ML research
    Hey everyone. I am very passionate about getting in ML research and was wondering what the learning pathway was, particularly with regards to the theoretical Math and Statistics involved. For context: I am a second year undergraduate. By the end of this year, I will have taken and finished A Multivariable Calculus with Proofs course, so that is my current starting point. I have been working with ML for the last 3 years and am proficient in Python and frameworks like PyTorch. I have also made my own implementation of several research papers (LSTMs, GRUs, Transformers, ELMo, BERT, GPT, as well as a few computer vision papers). I have a good general intuition of how deep learning works, but I want to formalise this knowledge with the adequate mathematical background so that I can eventually pursue a career in research. I understand that I have plenty of time until I reach there, and I am willing to dedicate it to grinding out the math and statistical knowledge required. I have done my research on this sub and other forums, and here are a few resources that stood out: Mathematics for Machine Learning by Deisenroth, Faisal and Ong Advanced Calculus of Several Variables by C. H. Edwards Jr. Mathematical Methods Lecture Notes from Imperial College by Deisenroth and Cheraghchi The original information theory paper by Shannon The Elements of Statistical Learning by Hastie, Tibshirani and Friedman Pattern Recognition and Machine Learning by Bishop The Probabalistic Machine Learning Series by Kevin P. Murphy Deep Learning by Goodfellow, Bengio and Courville Mathematics of Machine Learning on MIT OCW (here) My question is, what order should I start self-learning in, given the (somewhat limited) background knowledge I have? Also, are there any other resources that would help? submitted by /u/Far_Clothes_5054 [link] [comments]  ( 10 min )
    [D] What is the best open-source framework to create a synthetic and domain specific dataset for fine-tuning small models?
    Hi everyone, With the different data points, such as phi-1.5 performance being as good as 7b models on some tasks, it seems to be plausible that small models can be quite capable on specific tasks. I am working on BlindChat, an open-source and private solution to run small LLMs on your browser and I am interested in fine-tuning a phi-1.5 on some domain specific data. I am thinking of having an approach similar to the researchers of the phi paper, which is creating a high quality dataset using GPT3.5 / GPT4. Do you know good open-source frameworks that make it easy to create a high quality data for a specific task using an existing large model, like GPT3.5/4 or Llama 2 70b? submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [P] How do I train or tune an LLM like LLaMA for my business
    I want to tune Facebook's LLaMA or any available LLM model to be able to answer questions about my business. The idea is to provide a prompt of the business and some Q&As, then based on the provided information, the AI chatbot will answer customers who ask questions about the business. If the answers to the questions are not known or the question is not relevant, the bot should say "I dont know". submitted by /u/the_aceix [link] [comments]  ( 9 min )
  • Open

    I have blocked user u/NuseAI ...
    Hi, I have never done this before, but I have blocked user u/NuseAI from my feeds He/she is posting 'news' all over the AI subs, including this one, at the moment and is filling up my timeline ... and I simply don't feel right about what they are up to. Is it an AI bot? Is it a karma farmer? Is it some sort of spam? Am I being over cautious? If the consensus is that they are a normal poster - fine - I'll reenable their posts. In the meantime I'm enjoying a less cluttered feed! ​ submitted by /u/MrEloi [link] [comments]  ( 9 min )
    Counterfeit people': The danger posed by Meta’s AI celebrity lookalike chatbots
    Meta has launched chatbots with personalities similar to certain celebrities, which some experts believe could be dangerous. These chatbots have their own faces and social media accounts, and Meta is working on giving them a voice. However, experts argue that the idea of chatbots with personalities is impossible, as algorithms cannot demonstrate intention or free will. There is also a risk that chatbots with personalities could express problematic opinions, as seen in Meta's testing. Meta's project is driven by profit, as users are more likely to engage with chatbots that seem human. Experts believe that Meta should have explained the limits of these chatbots instead of emphasizing their human characteristics. Source : https://www.france24.com/en/technology/20230930-counterfeit-people-the-dangers-posed-by-meta-s-ai-celebrity-lookalike-chatbots submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Artificially Intelligent, Genuinely Creative: How AI's Triumph Over Human Creators Exposes the Illusion of Intellectual Property
    submitted by /u/DukeWilder [link] [comments]  ( 9 min )
    Is my domain name a good idea? What can I build on it? Go Go AI Go dot com .... No webpage on it now, any good ideas???
    I was cooking chicken wings one evening ago in the not too distant past and this idea popped into my head. Before the night was over I went online and bought the domain name of GoGoAIGo . com and then the .ai version also. I put the dot com version up on Sedo (sedo.com/search/details/?domain=GoGoAIGo.com) for sale and I actually now own the .com .ai .org and .net versions of that phrase. Not only my decade but the two generational decades in front of me and the one generational decade behind me can remember our ole Inspector Gadget friend whom had a similar phrase, but not exact, that he would say. I'm an individual whom may hold onto something if I feel it has intrinsic value for a future development, which I think this can if laid out in an appropriate fashion. I'm working on another business project right now and I own some trademarks for my other business project so I'm not exactly a newbie in ways here I'm just kind of fresh to the AI realm studies. I think it's overblown right now but will be fine tuned over the next 5-7 years better and society will find a better seat for it. I could see this domain being like a search engine or something, maybe even something to do with robots. I expect AI robots moving forward will be regulated and have various classes that they are placed into as we integrate certain ones in our society. Let's be honest, the light-switch isn't flipping overnight or even in one quick year over this AI stuff. I'm in no rush to have a piece of AI wash my dishes for me to be honest. The last robotic thing I was thinking about getting was a robot mower to cut a field, I believe they are working on those now. Anybody have any unique ideas for me? I used to play with lego robots way back in high school in the early 2000's.... Seems like this website would make a great search engine but honestly there are other phrases that can be put into play with society also. Thanks for any mental stimulation you can toss in my direction. submitted by /u/Wise_Cut_2543 [link] [comments]  ( 10 min )
    CGPT-4, how could an AI app designed to move people from their screens to better enjoying the people in their life do this?
    Imagine an app that's like a helpful buddy in your pocket, always looking out for the best moments to nudge you into some real-world socializing. For example, say you're a fan of watching sports. The app notices you frequently check scores or read articles on sports sites during weekends. Right before a big game, it pops up and says, "How about inviting some friends over to watch the game?" Now let's talk about making socializing a sort of game. Think of the way Fitbit rewards you for walking 10,000 steps. Similarly, this app could reward you with "social points" for various activities. Invite a friend for coffee? 10 points. Call your mom? 15 points. Organize a barbecue? 50 points. And so on. These points could unlock virtual badges or even real-world rewards like discounts at local restaurants to encourage you to keep going. When it comes to setting personal goals, let's say you've been wanting to improve your relationship with a sibling. You set a goal in the app to have at least one meaningful conversation with them each week. The app then reminds you on a lazy Sunday afternoon, suggesting, "Why not call your sister now? It’s a good time to catch up." And for reflection, after you've hung out with your friends to watch the game or had that talk with your sister, the app asks you to rate how good you felt on a scale of 1-10. Over time, you'll see a graph of your happiness levels correlated with your social activities, making it super clear that quality time with people is a mood booster. The whole idea is to keep it simple but effective, helping you to naturally weave more social interactions into your life without making you feel overwhelmed or stressed. submitted by /u/Georgeo57 [link] [comments]  ( 10 min )
    Is AI a Platform Shift?
    AI has the potential to be a platform shift, similar to previous shifts like personal computers, the internet, and mobile. Platform shifts change the dominant layer that applications are built on and can capture the majority of value from the previous generation. AI could change distribution, business models, and what's possible in workflows. Changes in distribution could lead to new aggregators replacing old ones, making the aggregation of quality content more difficult. The business model may not change significantly, with AI likely being delivered as software-as-a-service. AI can enable new workflows and drastically change existing ones. While incumbents may accrue significant value, new platforms could also replace old ones. Source : https://matt-rickard.com/is-ai-a-platform-shift submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Is there a market for Small Language Models for specific jobs/domains?
    It seems that large language models are getting bigger and bigger, and by growing they need more and more processing power. I know that some LLM developers have made smaller versions to test how small they can be made and function. But what happens when you want a LLM to do a specific job, surely it only needs a fraction of the data a general-purpose model does. Potential benefits of SLMs: Less data. Potentially faster. Less space to hallucinate/go wrong. Smaller set of potentials for complete testing. Running costs reduced. Lower spec hardware needs. Has anyone tried dedicating a LLM to a specific job/task and then optimizing its data size to create a SLM? TLDR; How large does a LLM have to be for a toaster or microwave? Talkie Toaster https://www.youtube.com/watch?v=vLm6oTCFcxQ submitted by /u/Arowx [link] [comments]  ( 9 min )
    Books 3 has revealed thousands of pirated Australian books. In the age of AI, is copyright law still fit for purpose?
    submitted by /u/Jariiari7 [link] [comments]  ( 9 min )
    Deep dive into Mastering Prompt Engineering (Prompt-tier list)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Looking for open source headless text to singing or better yet MIDI to singing software
    Scoured the Internet using all available tools. All I've come up with is proprietary and obsolete software and/or GUI-based software. My goal is to create an ElevenLabs type api but for singing. Something like Flinger (dead) would be ideal. If I can't find it I plan to write it but I'd rather not reinvent the wheel. submitted by /u/geeezeredm [link] [comments]  ( 9 min )
    Is it possible for AI to deeply analyze importance of thousands of daily news?
    I have access to texts of thousands of world news daily. Is it possible to make an AI that would analyze them and sort by importance? All I could find similar is NLP for analyzing text content and extracting keywords, or metadata, but this approach doesn't work well. I want for AI to grasp the essence of news and deeply understand their importance, to comprehend how an event affects many people's lives and has significant impact on society or the world as a whole. submitted by /u/canman44999 [link] [comments]  ( 9 min )
    Dalle-3 has me thinking about my unborn child and reality itself.
    I was able to throw these images together in seconds and it has me stunned. This is all in the first year of mainstream AI. Where are we going to be this time next year.. Philosophically what do you believe is going to happen to our paradigms of reality over the coming years? This is an especially challenging thought because we consume so much content and information digitally. I'm a little worried about how humans will or will not adjust to this incoming technology. I'm having my first child soon and it's interesting to think about what I may have to teach them. That nothing you consume digitally is real, only what you can experience with all 5 senses in your local environment is. Strange thoughts to be having for sure. With peace, Aqua. submitted by /u/Aquaritek [link] [comments]  ( 9 min )
    The Ethical Dilemmas of AI in Sci-Fi and Reality
    An interesting article about ethics and AI in the real world versus what we find in scifi. Exploring points like privacy invasion, possible sentience, control and moral decisions. https://discover.hubpages.com/technology/the-ethical-dilemmas-of-ai-in-sci-fi-and-reality submitted by /u/No_Adhesiveness_7209 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/29/2023
    Meta Platforms (META.O) Chief Executive Mark Zuckerberg on Wednesday rolled out new AI products for consumers, including bots that create photo-realistic images and smart glasses that answer questions, as well as an updated virtual-reality headset.[1] The European Union is examining alleged anticompetitive practices in chips used for artificial intelligence, a market that Nvidia (NVDA.O) dominates, Bloomberg News reported on Friday, citing people familiar with the matter.[2] Sex robots powered by futuristic AI algorithm will one day give humans the best sex of their lives, it has been sensationally claimed.[3] National Security Agency Director Army Gen. Paul M. Nakasone today announced the creation of a new entity to oversee the development and integration of artificial intelligence capabilities within U.S. national security systems.[4] Sources: [1] https://www.reuters.com/technology/meta-signal-future-arvr-investments-annual-connect-conference-2023-09-27/ [2] https://www.reuters.com/technology/eu-starts-early-stage-probe-into-nvidia-dominated-ai-chip-market-abuses-2023-09-29/ [3] https://www.dailystar.co.uk/news/weird-news/sex-robots-using-ai-give-31059169 [4] https://www.defense.gov/News/News-Stories/Article/Article/3541838/ai-security-center-to-open-at-national-security-agency/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Consecutive coupon collector problem
    Coupon collector problem Suppose you have a bag of balls labeled 1 through 1,000. You draw draw balls one at a time and put them back after each draw. How many draws would you have to make before you’ve seen every ball at least once? This is the coupon collector problem with N = 1000, […] Consecutive coupon collector problem first appeared on John D. Cook.  ( 6 min )
  • Open

    Testing RNN with RLlib
    Hi folks! Since you've saved my ass before, maybe you have an idea about my issue here, too. I'm training and testing a custom RNN, but I receive the following error message: File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper return func(self, *a, **k) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1291, in _compute_action_helper dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/models/modelv2.py", line 259, in __call__ res = self.forward(restored, state or [], seq_lens) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/models/torch/recurrent_net.py", line 92, in forward i…  ( 9 min )
  • Open

    RACH-Space: Reconstructing Adaptive Convex Hull Space with applications in weak supervision. (arXiv:2307.04870v3 [cs.LG] UPDATED)
    We introduce RACH-Space, a novel classification method in ensemble learning. In particular, we show its applicability as a label model for weakly supervised learning. RACH-Space offers simplicity in implementation with minimal assumptions on the data or weak signals. The model is well suited for scenarios where fully labeled data is not available. Our method is built upon geometrical interpretation of the space spanned by weak signals. Our analysis of the high dimensional convex hull structure underlying general set of weak signals bridges geometry with machine learning. Empirical results also demonstrate that RACH-Space works well in practice and compares favorably to best existing label models for weakly supervised learning.  ( 2 min )
    From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity. (arXiv:2309.16512v1 [cs.LG])
    In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.  ( 2 min )
    MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network. (arXiv:2309.16374v1 [cs.LG])
    Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.  ( 2 min )
    Group-Agent Reinforcement Learning. (arXiv:2202.05135v4 [cs.LG] UPDATED)
    It can largely benefit the reinforcement learning (RL) process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively. Different from multi-agent reinforcement learning (MARL) where multiple agents are in a common environment and should learn to cooperate or compete with each other, in this case each agent has its separate environment and only communicates with others to share knowledge without any cooperative or competitive behaviour as a learning outcome. In fact, this scenario exists widely in real life whose concept can be utilised in many applications, but is not well understood yet and not well formulated. As the first effort, we propose group-agent system for RL as a formulation of this scenario and the third type of RL system with respect to single-agent and multi-agent systems. We then propose a distributed RL framework called DDAL (Decentralised Distributed Asynchronous Learning) designed for group-agent reinforcement learning (GARL). We show through experiments that DDAL achieved desirable performance with very stable training and has good scalability.  ( 2 min )
    AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models. (arXiv:2309.16414v1 [cs.CV])
    Classifiers built upon vision-language models such as CLIP have shown remarkable zero-shot performance across a broad range of image classification tasks. Prior work has studied different ways of automatically creating descriptor sets for every class based on prompt templates, ranging from manually engineered templates over templates obtained from a large language model to templates built from random words and characters. In contrast, deriving zero-shot classifiers from the respective encoded class descriptors has remained nearly unchanged, that is: classify to the class that maximizes the cosine similarity between its averaged encoded class descriptors and the encoded image. However, weighting all class descriptors equally can be suboptimal when certain descriptors match visual clues on a given image better than others. In this work, we propose AutoCLIP, a method for auto-tuning zero-shot classifiers. AutoCLIP assigns to each prompt template per-image weights, which are derived from statistics of class descriptor-image similarities at inference time. AutoCLIP is fully unsupervised, has very low overhead, and can be easily implemented in few lines of code. We show that for a broad range of vision-language models, datasets, and prompt templates, AutoCLIP outperforms baselines consistently and by up to 3 percent point accuracy.  ( 2 min )
    An Uncertainty-Aware Pseudo-Label Selection Framework using Regularized Conformal Prediction. (arXiv:2309.15963v1 [cs.LG])
    Consistency regularization-based methods are prevalent in semi-supervised learning (SSL) algorithms due to their exceptional performance. However, they mainly depend on domain-specific data augmentations, which are not usable in domains where data augmentations are less practicable. On the other hand, Pseudo-labeling (PL) is a general and domain-agnostic SSL approach that, unlike consistency regularization-based methods, does not rely on the domain. PL underperforms due to the erroneous high-confidence predictions from poorly calibrated models. This paper proposes an uncertainty-aware pseudo-label selection framework that employs uncertainty sets yielded by the conformal regularization algorithm to fix the poor calibration neural networks, reducing noisy training data. The codes of this work are available at: https://github.com/matinmoezzi/ups conformal classification  ( 2 min )
    Compilation as a Defense: Enhancing DL Model Attack Robustness via Tensor Optimization. (arXiv:2309.16577v1 [cs.LG])
    Adversarial Machine Learning (AML) is a rapidly growing field of security research, with an often overlooked area being model attacks through side-channels. Previous works show such attacks to be serious threats, though little progress has been made on efficient remediation strategies that avoid costly model re-engineering. This work demonstrates a new defense against AML side-channel attacks using model compilation techniques, namely tensor optimization. We show relative model attack effectiveness decreases of up to 43% using tensor optimization, discuss the implications, and direction of future work.  ( 2 min )
    Compositional Program Generation for Systematic Generalization. (arXiv:2309.16467v1 [cs.LG])
    Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Machine learning models, including the now ubiquitous transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: modularity, type abstraction, and recursive composition, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs. CPG learns parameters for the semantic modules and is able to learn the semantics for new types incrementally. Given a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, in both standard and extreme few-shot settings.  ( 3 min )
    Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples. (arXiv:2309.16143v1 [cs.LG])
    Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.  ( 3 min )
    Instance-Agnostic Geometry and Contact Dynamics Learning. (arXiv:2309.05832v2 [cs.CV] UPDATED)
    This work presents an instance-agnostic learning framework that fuses vision with dynamics to simultaneously learn shape, pose trajectories, and physical properties via the use of geometry as a shared representation. Unlike many contact learning approaches that assume motion capture input and a known shape prior for the collision model, our proposed framework learns an object's geometric and dynamic properties from RGBD video, without requiring either category-level or instance-level shape priors. We integrate a vision system, BundleSDF, with a dynamics system, ContactNets, and propose a cyclic training pipeline to use the output from the dynamics module to refine the poses and the geometry from the vision module, using perspective reprojection. Experiments demonstrate our framework's ability to learn the geometry and dynamics of rigid and convex objects and improve upon the current tracking framework.  ( 2 min )
    Safe Imitation Learning of Nonlinear Model Predictive Control for Flexible Robots. (arXiv:2212.02941v2 [cs.RO] UPDATED)
    Flexible robots may overcome some of the industry's major challenges, such as enabling intrinsically safe human-robot collaboration and achieving a higher load-to-mass ratio. However, controlling flexible robots is complicated due to their complex dynamics, which include oscillatory behavior and a high-dimensional state space. NMPC offers an effective means to control such robots, but its extensive computational demands often limit its application in real-time scenarios. To enable fast control of flexible robots, we propose a framework for a safe approximation of NMPC using imitation learning and a predictive safety filter. Our framework significantly reduces computation time while incurring a slight loss in performance. Compared to NMPC, our framework shows more than a eightfold improvement in computation time when controlling a three-dimensional flexible robot arm in simulation, all while guaranteeing safety constraints. Notably, our approach outperforms conventional reinforcement learning methods. The development of fast and safe approximate NMPC holds the potential to accelerate the adoption of flexible robots in industry.  ( 2 min )
    Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020-2022. (arXiv:2305.04811v2 [q-fin.ST] UPDATED)
    Accurately predicting the prices of financial time series is essential and challenging for the financial sector. Owing to recent advancements in deep learning techniques, deep learning models are gradually replacing traditional statistical and machine learning models as the first choice for price forecasting tasks. This shift in model selection has led to a notable rise in research related to applying deep learning models to price forecasting, resulting in a rapid accumulation of new knowledge. Therefore, we conducted a literature review of relevant studies over the past three years with a view to aiding researchers and practitioners in the field. This review delves deeply into deep learning-based forecasting models, presenting information on model architectures, practical applications, and their respective advantages and disadvantages. In particular, detailed information is provided on advanced models for price forecasting, such as Transformers, generative adversarial networks (GANs), graph neural networks (GNNs), and deep quantum neural networks (DQNNs). The present contribution also includes potential directions for future research, such as examining the effectiveness of deep learning models with complex structures for price forecasting, extending from point prediction to interval prediction using deep learning models, scrutinising the reliability and validity of decomposition ensembles, and exploring the influence of data volume on model performance.  ( 3 min )
    TinyMetaFed: Efficient Federated Meta-Learning for TinyML. (arXiv:2307.06822v3 [cs.LG] UPDATED)
    The field of Tiny Machine Learning (TinyML) has made substantial advancements in democratizing machine learning on low-footprint devices, such as microcontrollers. The prevalence of these miniature devices raises the question of whether aggregating their knowledge can benefit TinyML applications. Federated meta-learning is a promising answer to this question, as it addresses the scarcity of labeled data and heterogeneous data distribution across devices in the real world. However, deploying TinyML hardware faces unique resource constraints, making existing methods impractical due to energy, privacy, and communication limitations. We introduce TinyMetaFed, a model-agnostic meta-learning framework suitable for TinyML. TinyMetaFed facilitates collaborative training of a neural network initialization that can be quickly fine-tuned on new devices. It offers communication savings and privacy protection through partial local reconstruction and Top-P% selective communication, computational efficiency via online learning, and robustness to client heterogeneity through few-shot learning. The evaluations on three TinyML use cases demonstrate that TinyMetaFed can significantly reduce energy consumption and communication overhead, accelerate convergence, and stabilize the training process.  ( 2 min )
    Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance. (arXiv:2309.16604v1 [stat.ML])
    Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.  ( 2 min )
    Capturing the Diffusive Behavior of the Multiscale Linear Transport Equations by Asymptotic-Preserving Convolutional DeepONets. (arXiv:2306.15891v3 [cs.LG] UPDATED)
    In this paper, we introduce two types of novel Asymptotic-Preserving Convolutional Deep Operator Networks (APCONs) designed to address the multiscale time-dependent linear transport problem. We observe that the vanilla physics-informed DeepONets with modified MLP may exhibit instability in maintaining the desired limiting macroscopic behavior. Therefore, this necessitates the utilization of an asymptotic-preserving loss function. Drawing inspiration from the heat kernel in the diffusion equation, we propose a new architecture called Convolutional Deep Operator Networks, which employ multiple local convolution operations instead of a global heat kernel, along with pooling and activation operations in each filter layer. Our APCON methods possess a parameter count that is independent of the grid size and are capable of capturing the diffusive behavior of the linear transport problem. Finally, we validate the effectiveness of our methods through several numerical examples.  ( 2 min )
    Visual In-Context Learning for Few-Shot Eczema Segmentation. (arXiv:2309.16656v1 [cs.CV])
    Automated diagnosis of eczema from digital camera images is crucial for developing applications that allow patients to self-monitor their recovery. An important component of this is the segmentation of eczema region from such images. Current methods for eczema segmentation rely on deep neural networks such as convolutional (CNN)-based U-Net or transformer-based Swin U-Net. While effective, these methods require high volume of annotated data, which can be difficult to obtain. Here, we investigate the capabilities of visual in-context learning that can perform few-shot eczema segmentation with just a handful of examples and without any need for retraining models. Specifically, we propose a strategy for applying in-context learning for eczema segmentation with a generalist vision model called SegGPT. When benchmarked on a dataset of annotated eczema images, we show that SegGPT with just 2 representative example images from the training dataset performs better (mIoU: 36.69) than a CNN U-Net trained on 428 images (mIoU: 32.60). We also discover that using more number of examples for SegGPT may in fact be harmful to its performance. Our result highlights the importance of visual in-context learning in developing faster and better solutions to skin imaging tasks. Our result also paves the way for developing inclusive solutions that can cater to minorities in the demographics who are typically heavily under-represented in the training data.  ( 2 min )
    Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices. (arXiv:2309.06612v2 [cs.LG] UPDATED)
    The recent surge of interest surrounding Multimodal Neural Networks (MM-NN) is attributed to their ability to effectively process and integrate multiscale information from diverse data sources. MM-NNs extract and fuse features from multiple modalities using adequate unimodal backbones and specific fusion networks. Although this helps strengthen the multimodal information representation, designing such networks is labor-intensive. It requires tuning the architectural parameters of the unimodal backbones, choosing the fusing point, and selecting the operations for fusion. Furthermore, multimodality AI is emerging as a cutting-edge option in Internet of Things (IoT) systems where inference latency and energy consumption are critical metrics in addition to accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices. Harmonic-NAS involves a two-tier optimization approach for the unimodal backbone architectures and fusion strategy and operators. By incorporating the hardware dimension into the optimization, evaluation results on various devices and multimodal datasets have demonstrated the superiority of Harmonic-NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.  ( 2 min )
    Learning Large-Scale MTP$_2$ Gaussian Graphical Models via Bridge-Block Decomposition. (arXiv:2309.13405v2 [cs.LG] UPDATED)
    This paper studies the problem of learning the large-scale Gaussian graphical models that are multivariate totally positive of order two ($\text{MTP}_2$). By introducing the concept of bridge, which commonly exists in large-scale sparse graphs, we show that the entire problem can be equivalently optimized through (1) several smaller-scaled sub-problems induced by a \emph{bridge-block decomposition} on the thresholded sample covariance graph and (2) a set of explicit solutions on entries corresponding to \emph{bridges}. From practical aspect, this simple and provable discipline can be applied to break down a large problem into small tractable ones, leading to enormous reduction on the computational complexity and substantial improvements for all existing algorithms. The synthetic and real-world experiments demonstrate that our proposed method presents a significant speed-up compared to the state-of-the-art benchmarks.  ( 2 min )
    Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy. (arXiv:1911.09307v2 [cs.LG] UPDATED)
    Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of the neighboring relationship between samples. In this work, we propose a general regularizer called \textbf{Patch-level Neighborhood Interpolation~(Pani)} that conducts a non-local representation in the computation of networks. Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived \textbf{Pani VAT} presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. The second derived \textbf{Pani MixUp} method extends the MixUp, and achieves superiority over MixUp and competitive performance over state-of-the-art variants of MixUp method with a significant advantage in computational efficiency. Extensive experiments have verified the effectiveness of our Pani approach in both supervised and semi-supervised settings.  ( 2 min )
    Delay-Aware Hierarchical Federated Learning. (arXiv:2303.12414v4 [cs.LG] UPDATED)
    Federated learning has gained popularity as a means of training models distributed across the wireless edge. The paper introduces delay-aware hierarchical federated learning (DFL) to improve the efficiency of distributed machine learning (ML) model training by accounting for communication delays between edge and cloud. Different from traditional federated learning, DFL leverages multiple stochastic gradient descent iterations on local datasets within each global aggregation period and intermittently aggregates model parameters through edge servers in local subnetworks. During global synchronization, the cloud server consolidates local models with the outdated global model using a local-global combiner, thus preserving crucial elements of both, enhancing learning efficiency under the presence of delay. A set of conditions is obtained to achieve the sub-linear convergence rate of O(1/k) for strongly convex and smooth loss functions. Based on these findings, an adaptive control algorithm is developed for DFL, implementing policies to mitigate energy consumption and communication latency while aiming for sublinear convergence. Numerical evaluations show DFL's superior performance in terms of faster global model convergence, reduced resource consumption, and robustness against communication delays compared to existing FL algorithms. In summary, this proposed method offers improved efficiency and results when dealing with both convex and non-convex loss functions.  ( 2 min )
    On the Trade-offs between Adversarial Robustness and Actionable Explanations. (arXiv:2309.16452v1 [cs.LG])
    As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations  ( 2 min )
    Cross-Prediction-Powered Inference. (arXiv:2309.16598v1 [stat.ML])
    While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.  ( 2 min )
    Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning. (arXiv:2309.16286v1 [cs.LG])
    Federated learning is an important privacy-preserving multi-party learning paradigm, involving collaborative learning with others and local updating on private data. Model heterogeneity and catastrophic forgetting are two crucial challenges, which greatly limit the applicability and generalizability. This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation, facilitating the both intra-domain discriminability and inter-domain generalization. For heterogeneity issue, we leverage irrelevant unlabeled public data for communication between the heterogeneous participants. We construct cross-correlation matrix and align instance similarity distribution on both logits and feature levels, which effectively overcomes the communication barrier and improves the generalizable ability. For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation, which retains inter-domain knowledge while avoiding the optimization conflict issue, fulling distilling privileged inter-domain information through depicting posterior classes relation. Considering that there is no standard benchmark for evaluating existing heterogeneous federated learning under the same setting, we present a comprehensive benchmark with extensive representative methods under four domain shift scenarios, supporting both heterogeneous and homogeneous federated settings. Empirical results demonstrate the superiority of our method and the efficiency of modules on various scenarios.  ( 2 min )
    Language models in molecular discovery. (arXiv:2309.16235v1 [physics.chem-ph])
    The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.  ( 2 min )
    Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear Systems via Sums-of-Squares Optimization. (arXiv:2304.12405v2 [cs.RO] UPDATED)
    We present a method for synthesizing dynamic, reduced-order output-feedback polynomial control policies for control-affine nonlinear systems which guarantees runtime stability to a goal state, when using visual observations and a learned perception module in the feedback control loop. We leverage Lyapunov analysis to formulate the problem of synthesizing such policies. This problem is nonconvex in the policy parameters and the Lyapunov function that is used to prove the stability of the policy. To solve this problem approximately, we propose two approaches: the first solves a sequence of sum-of-squares optimization problems to iteratively improve a policy which is provably-stable by construction, while the second directly performs gradient-based optimization on the parameters of the polynomial policy, and its closed-loop stability is verified a posteriori. We extend our approach to provide stability guarantees in the presence of observation noise, which realistically arises due to errors in the learned perception module. We evaluate our approach on several underactuated nonlinear systems, including pendula and quadrotors, showing that our guarantees translate to empirical stability when controlling these systems from images, while baseline approaches can fail to reliably stabilize the system.
    Probabilistic Invariant Learning with Randomized Linear Classifiers. (arXiv:2308.04412v2 [cs.LG] UPDATED)
    Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
    Set Learning for Accurate and Calibrated Models. (arXiv:2307.02245v3 [cs.LG] UPDATED)
    Model overconfidence and poor calibration are common in machine learning and difficult to account for when applying standard empirical risk minimization. In this work, we propose a novel method to alleviate these problems that we call odd-$k$-out learning (OKO), which minimizes the cross-entropy error for sets rather than for single examples. This naturally allows the model to capture correlations across data examples and achieves both better accuracy and calibration, especially in limited training data and class-imbalanced regimes. Perhaps surprisingly, OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning, such as temperature scaling. We provide theoretical justification, establishing that OKO naturally yields better calibration, and provide extensive experimental analyses that corroborate our theoretical findings. We emphasize that OKO is a general framework that can be easily adapted to many settings and the trained model can be applied to single examples at inference time, without introducing significant run-time overhead or architecture changes.
    Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification. (arXiv:2309.16369v1 [cs.SD])
    The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the audio scene classification task of the DCASE2020 challenge data. Our analysis is based on twodimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.
    Attribute Graph Clustering via Learnable Augmentation. (arXiv:2212.03559v2 [cs.LG] UPDATED)
    Contrastive deep graph clustering (CDGC) utilizes contrastive learning to group nodes into different clusters. Better augmentation techniques benefit the quality of the contrastive samples, thus being one of key factors to improve performance. However, the augmentation samples in existing methods are always predefined by human experiences, and agnostic from the downstream task clustering, thus leading to high human resource costs and poor performance. To this end, we propose an Attribute Graph Clustering method via Learnable Augmentation (\textbf{AGCLA}), which introduces learnable augmentors for high-quality and suitable augmented samples for CDGC. Specifically, we design two learnable augmentors for attribute and structure information, respectively. Besides, two refinement matrices, including the high-confidence pseudo-label matrix and the cross-view sample similarity matrix, are generated to improve the reliability of the learned affinity matrix. During the training procedure, we notice that there exist differences between the optimization goals for training learnable augmentors and contrastive learning networks. In other words, we should both guarantee the consistency of the embeddings as well as the diversity of the augmented samples. Thus, an adversarial learning mechanism is designed in our method. Moreover, a two-stage training strategy is leveraged for the high-confidence refinement matrices. Extensive experimental results demonstrate the effectiveness of AGCLA on six benchmark datasets.
    DynaBench: A benchmark dataset for learning dynamical systems from low-resolution data. (arXiv:2306.05805v2 [cs.LG] UPDATED)
    Previous work on learning physical systems from data has focused on high-resolution grid-structured measurements. However, real-world knowledge of such systems (e.g. weather data) relies on sparsely scattered measuring stations. In this paper, we introduce a novel simulated benchmark dataset, DynaBench, for learning dynamical systems directly from sparsely scattered data without prior knowledge of the equations. The dataset focuses on predicting the evolution of a dynamical system from low-resolution, unstructured measurements. We simulate six different partial differential equations covering a variety of physical systems commonly used in the literature and evaluate several machine learning models, including traditional graph neural networks and point cloud processing models, with the task of predicting the evolution of the system. The proposed benchmark dataset is expected to advance the state of art as an out-of-the-box easy-to-use tool for evaluating models in a setting where only unstructured low-resolution observations are available. The benchmark is available at https://anonymous.4open.science/r/code-2022-dynabench/.
    Vertical Federated Learning: Concepts, Advances and Challenges. (arXiv:2211.12814v4 [cs.LG] UPDATED)
    Vertical Federated Learning (VFL) is a federated learning setting where multiple parties with different features about the same set of users jointly train machine learning models without exposing their raw data or model parameters. Motivated by the rapid growth in VFL research and real-world applications, we provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects, including effectiveness, efficiency, and privacy. We provide an exhaustive categorization for VFL settings and privacy-preserving protocols and comprehensively analyze the privacy attacks and defense strategies for each protocol. In the end, we propose a unified framework, termed VFLow, which considers the VFL problem under communication, computation, privacy, as well as effectiveness and fairness constraints. Finally, we review the most recent advances in industrial applications, highlighting open challenges and future directions for VFL.
    The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering. (arXiv:2309.15954v1 [cs.CV])
    The quality of pre-training data plays a critical role in the performance of foundation models. Popular foundation models often design their own recipe for data filtering, which makes it hard to analyze and compare different data filtering approaches. DataComp is a new benchmark dedicated to evaluating different methods for data filtering. This paper describes our learning and solution when participating in the DataComp challenge. Our filtering strategy includes three stages: single-modality filtering, cross-modality filtering, and data distribution alignment. We integrate existing methods and propose new solutions, such as computing CLIP score on horizontally flipped images to mitigate the interference of scene text, using vision and language models to retrieve training samples for target downstream tasks, rebalancing the data distribution to improve the efficiency of allocating the computational budget, etc. We slice and dice our design choices, provide in-depth analysis, and discuss open questions. Our approach outperforms the best method from the DataComp paper by over 4% on the average performance of 38 tasks and by over 2% on ImageNet.
    Machine Learning Based Analytics for the Significance of Gait Analysis in Monitoring and Managing Lower Extremity Injuries. (arXiv:2309.15990v1 [cs.LG])
    This study explored the potential of gait analysis as a tool for assessing post-injury complications, e.g., infection, malunion, or hardware irritation, in patients with lower extremity fractures. The research focused on the proficiency of supervised machine learning models predicting complications using consecutive gait datasets. We identified patients with lower extremity fractures at an academic center. Patients underwent gait analysis with a chest-mounted IMU device. Using software, raw gait data was preprocessed, emphasizing 12 essential gait variables. Machine learning models including XGBoost, Logistic Regression, SVM, LightGBM, and Random Forest were trained, tested, and evaluated. Attention was given to class imbalance, addressed using SMOTE. We introduced a methodology to compute the Rate of Change (ROC) for gait variables, independent of the time difference between gait analyses. XGBoost was the optimal model both before and after applying SMOTE. Prior to SMOTE, the model achieved an average test AUC of 0.90 (95% CI: [0.79, 1.00]) and test accuracy of 86% (95% CI: [75%, 97%]). Feature importance analysis attributed importance to the duration between injury and gait analysis. Data patterns showed early physiological compensations, followed by stabilization phases, emphasizing prompt gait analysis. This study underscores the potential of machine learning, particularly XGBoost, in gait analysis for orthopedic care. Predicting post-injury complications, early gait assessment becomes vital, revealing intervention points. The findings support a shift in orthopedics towards a data-informed approach, enhancing patient outcomes.
    Neuro-Inspired Hierarchical Multimodal Learning. (arXiv:2309.15877v1 [cs.LG])
    Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks.
    Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation. (arXiv:2309.16173v1 [cs.LG])
    Graph unlearning has emerged as a pivotal method to delete information from a pre-trained graph neural network (GNN). One may delete nodes, a class of nodes, edges, or a class of edges. An unlearning method enables the GNN model to comply with data protection regulations (i.e., the right to be forgotten), adapt to evolving data distributions, and reduce the GPU-hours carbon footprint by avoiding repetitive retraining. Existing partitioning and aggregation-based methods have limitations due to their poor handling of local graph dependencies and additional overhead costs. More recently, GNNDelete offered a model-agnostic approach that alleviates some of these issues. Our work takes a novel approach to address these challenges in graph unlearning through knowledge distillation, as it distills to delete in GNN (D2DGN). It is a model-agnostic distillation framework where the complete graph knowledge is divided and marked for retention and deletion. It performs distillation with response-based soft targets and feature-based node embedding while minimizing KL divergence. The unlearned model effectively removes the influence of deleted graph elements while preserving knowledge about the retained graph elements. D2DGN surpasses the performance of existing methods when evaluated on various real-world graph datasets by up to $43.1\%$ (AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by $2.4\%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.
    Can LLMs Effectively Leverage Structural Information for Graph Learning: When and Why. (arXiv:2309.16595v1 [cs.LG])
    This paper studies Large Language Models (LLMs) for structured data--particularly graphs--a crucial data modality that remains underexplored in the LLM literature. We aim to understand when and why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs on node classification tasks. To address the ``when'' question, we examine a variety of prompting methods for encoding structural information, in settings where textual node features are either rich or scarce. For the ``why'' questions, we probe into two potential contributing factors to the LLM performance: data leakage and homophily. Our exploration of these questions reveals that (i) LLMs can benefit from structural information, especially when textual node features are scarce; (ii) there is no substantial evidence indicating that the performance of LLMs is significantly attributed to data leakage; and (iii) the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node.
    MotionLM: Multi-Agent Motion Forecasting as Language Modeling. (arXiv:2309.16534v1 [cs.CV])
    Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.
    Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection. (arXiv:2309.16592v1 [cs.CV])
    The primary bottleneck towards obtaining good recognition performance in IR images is the lack of sufficient labeled training data, owing to the cost of acquiring such data. Realizing that object detection methods for the RGB modality are quite robust (at least for some commonplace classes, like person, car, etc.), thanks to the giant training sets that exist, in this work we seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality. At the core of our method, is a novel tensor decomposition method called TensorFact which splits the convolution kernels of a layer of a Convolutional Neural Network (CNN) into low-rank factor matrices, with fewer parameters than the original CNN. We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting, while encouraging them to capture complementary cues from those trained only on the RGB modality. We validate our approach empirically by first assessing how well our TensorFact decomposed network performs at the task of detecting objects in RGB images vis-a-vis the original network and then look at how well it adapts to IR images of the FLIR ADAS v1 dataset. For the latter, we train models under scenarios that pose challenges stemming from data paucity. From the experiments, we observe that: (i) TensorFact shows performance gains on RGB images; (ii) further, this pre-trained model, when fine-tuned, outperforms a standard state-of-the-art object detector on the FLIR ADAS v1 dataset by about 4% in terms of mAP 50 score.
    Towards Poisoning Fair Representations. (arXiv:2309.16487v1 [cs.LG])
    Fair machine learning seeks to mitigate model prediction bias against certain demographic subgroups such as elder and female. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data poisoning framework attacking FRL. We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data. This attack entails a prohibitive bilevel optimization, wherefore an effective approximated solution is proposed. A theoretical analysis on the needed number of poisoning samples is derived and sheds light on defending against the attack. Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.
    Differentially Private Secure Multiplication: Hiding Information in the Rubble of Noise. (arXiv:2309.16105v1 [cs.IT])
    We consider the problem of private distributed multi-party multiplication. It is well-established that Shamir secret-sharing coding strategies can enable perfect information-theoretic privacy in distributed computation via the celebrated algorithm of Ben Or, Goldwasser and Wigderson (the "BGW algorithm"). However, perfect privacy and accuracy require an honest majority, that is, $N \geq 2t+1$ compute nodes are required to ensure privacy against any $t$ colluding adversarial nodes. By allowing for some controlled amount of information leakage and approximate multiplication instead of exact multiplication, we study coding schemes for the setting where the number of honest nodes can be a minority, that is $N< 2t+1.$ We develop a tight characterization privacy-accuracy trade-off for cases where $N < 2t+1$ by measuring information leakage using {differential} privacy instead of perfect privacy, and using the mean squared error metric for accuracy. A novel technical aspect is an intricately layered noise distribution that merges ideas from differential privacy and Shamir secret-sharing at different layers.  ( 2 min )
    Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption. (arXiv:2309.16409v1 [stat.ML])
    The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.  ( 2 min )
    LawBench: Benchmarking Legal Knowledge of Large Language Models. (arXiv:2309.16289v1 [cs.CL])
    Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.  ( 3 min )
    Stackelberg Batch Policy Learning. (arXiv:2309.16188v1 [stat.ML])
    Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.  ( 2 min )
    Improving Adaptive Online Learning Using Refined Discretization. (arXiv:2309.16044v1 [cs.LG])
    We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.  ( 2 min )
    Masked autoencoders are scalable learners of cellular morphology. (arXiv:2309.16064v1 [cs.CV])
    Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how weakly supervised and self-supervised deep learning approaches scale when training larger models on larger datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 95-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised models at inferring known biological relationships curated from public databases.  ( 2 min )
    Deep Learning Based Uplink Multi-User SIMO Beamforming Design. (arXiv:2309.16603v1 [cs.IT])
    The advancement of fifth generation (5G) wireless communication networks has created a greater demand for wireless resource management solutions that offer high data rates, extensive coverage, minimal latency and energy-efficient performance. Nonetheless, traditional approaches have shortcomings when it comes to computational complexity and their ability to adapt to dynamic conditions, creating a gap between theoretical analysis and the practical execution of algorithmic solutions for managing wireless resources. Deep learning-based techniques offer promising solutions for bridging this gap with their substantial representation capabilities. We propose a novel unsupervised deep learning framework, which is called NNBF, for the design of uplink receive multi-user single input multiple output (MU-SIMO) beamforming. The primary objective is to enhance the throughput by focusing on maximizing the sum-rate while also offering computationally efficient solution, in contrast to established conventional methods. We conduct experiments for several antenna configurations. Our experimental results demonstrate that NNBF exhibits superior performance compared to our baseline methods, namely, zero-forcing beamforming (ZFBF) and minimum mean square error (MMSE) equalizer. Additionally, NNBF is scalable to the number of single-antenna user equipments (UEs) while baseline methods have significant computational burden due to matrix pseudo-inverse operation.
    Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models. (arXiv:2309.16316v1 [astro-ph.SR])
    Light curves of stars encapsulate a wealth of information about stellar oscillations and granulation, thereby offering key insights into the internal structure and evolutionary state of stars. Conventional asteroseismic techniques have been largely confined to power spectral analysis, neglecting the valuable phase information contained within light curves. While recent machine learning applications in asteroseismology utilizing Convolutional Neural Networks (CNNs) have successfully inferred stellar attributes from light curves, they are often limited by the local feature extraction inherent in convolutional operations. To circumvent these constraints, we present $\textit{Astroconformer}$, a Transformer-based deep learning framework designed to capture long-range dependencies in stellar light curves. Our empirical analysis, which focuses on estimating surface gravity ($\log g$), is grounded in a carefully curated dataset derived from $\textit{Kepler}$ light curves. These light curves feature asteroseismic $\log g$ values spanning from 0.2 to 4.4. Our results underscore that, in the regime where the training data is abundant, $\textit{Astroconformer}$ attains a root-mean-square-error (RMSE) of 0.017 dex around $\log g \approx 3 $. Even in regions where training data are sparse, the RMSE can reach 0.1 dex. It outperforms not only the K-nearest neighbor-based model ($\textit{The SWAN}$) but also state-of-the-art CNNs. Ablation studies confirm that the efficacy of the models in this particular task is strongly influenced by the size of their receptive fields, with larger receptive fields correlating with enhanced performance. Moreover, we find that the attention mechanisms within $\textit{Astroconformer}$ are well-aligned with the inherent characteristics of stellar oscillations and granulation present in the light curves.
    Towards Best Practices of Activation Patching in Language Models: Metrics and Methods. (arXiv:2309.16042v1 [cs.LG])
    Mechanistic interpretability seeks to understand the internal mechanisms of machine learning models, where localization -- identifying the important model components -- is a key step. Activation patching, also known as causal tracing or interchange intervention, is a standard technique for this task (Vig et al., 2020), but the literature contains many variants with little consensus on the choice of hyperparameters or methodology. In this work, we systematically examine the impact of methodological details in activation patching, including evaluation metrics and corruption methods. In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results. Backed by empirical observations, we give conceptual arguments for why certain metrics or methods may be preferred. Finally, we provide recommendations for the best practices of activation patching going forwards.
    GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization. (arXiv:2309.16020v1 [cs.CV])
    Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. This task has considerable challenges due to immense variation in geographic landscapes. The image-to-image retrieval-based approaches fail to solve this problem on a global scale as it is not feasible to construct a large gallery of images covering the entire world. Instead, existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. However, their performance is limited by the predefined classes and often results in inaccurate localizations when an image's location significantly deviates from its class center. To overcome these limitations, we propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations. GeoCLIP's location encoder models the Earth as a continuous function by employing positional encoding through random Fourier features and constructing a hierarchical representation that captures information at varying resolutions to yield a semantically rich high-dimensional feature suitable to use even beyond geo-localization. To the best of our knowledge, this is the first work employing GPS encoding for geo-localization. We demonstrate the efficacy of our method via extensive experiments and ablations on benchmark datasets. We achieve competitive performance with just 20% of training data, highlighting its effectiveness even in limited-data settings. Furthermore, we qualitatively demonstrate geo-localization using a text query by leveraging CLIP backbone of our image encoder.
    Correcting for heterogeneity in real-time epidemiological indicators. (arXiv:2309.16546v1 [cs.LG])
    Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional surveillance signals. We describe the problem of spatial and temporal heterogeneity in these signals derived from these data sources, where spatial and/or temporal biases are present. We present a method to use a ``guiding'' signal to correct for these biases and produce a more reliable signal that can be used for modeling and forecasting. The method assumes that the heterogeneity can be approximated by a low-rank matrix and that the temporal heterogeneity is smooth over time. We also present a hyperparameter selection algorithm to choose the parameters representing the matrix rank and degree of temporal smoothness of the corrections. In the absence of ground truth, we use maps and plots to argue that this method does indeed reduce heterogeneity. Reducing heterogeneity from auxiliary data sources greatly increases their utility in modeling and forecasting epidemics.
    High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality. (arXiv:2309.16476v1 [math.ST])
    We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
    A Spectral Approach for Learning Spatiotemporal Neural Differential Equations. (arXiv:2309.16131v1 [cs.LG])
    Rapidly developing machine learning methods has stimulated research interest in computationally reconstructing differential equations (DEs) from observational data which may provide additional insight into underlying causative mechanisms. In this paper, we propose a novel neural-ODE based method that uses spectral expansions in space to learn spatiotemporal DEs. The major advantage of our spectral neural DE learning approach is that it does not rely on spatial discretization, thus allowing the target spatiotemporal equations to contain long range, nonlocal spatial interactions that act on unbounded spatial domains. Our spectral approach is shown to be as accurate as some of the latest machine learning approaches for learning PDEs operating on bounded domains. By developing a spectral framework for learning both PDEs and integro-differential equations, we extend machine learning methods to apply to unbounded DEs and a larger class of problems.
    AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models. (arXiv:2307.12499v2 [cs.LG] UPDATED)
    Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often utilize Generative Adversarial Networks (GANs), which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable to generate high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective to generate unrestricted adversarial examples, which outperforms GAN-based methods in terms of attack performance and generation quality.
    Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness. (arXiv:2309.16096v1 [cs.LG])
    The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.
    Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations via Inverse Reinforcement Learning. (arXiv:2309.16074v1 [cs.RO])
    Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.
    VAE-based latent-space classification of RNO-G data. (arXiv:2309.16401v1 [astro-ph.HE])
    The Radio Neutrino Observatory in Greenland (RNO-G) is a radio-based ultra-high energy neutrino detector located at Summit Station, Greenland. It is still being constructed, with 7 stations currently operational. Neutrino detection works by measuring Askaryan radiation produced by neutrino-nucleon interactions. A neutrino candidate must be found amidst other backgrounds which are recorded at much higher rates -- including cosmic-rays and anthropogenic noise -- the origins of which are sometimes unknown. Here we describe a method to classify different noise classes using the latent space of a variational autoencoder. The latent space forms a compact representation that makes classification tractable. We analyze data from a noisy and a silent station. The method automatically detects and allows us to qualitatively separate multiple event classes, including physical wind-induced signals, for both the noisy and the quiet station.
    Contrastive Learning of Temporal Distinctiveness for Survival Analysis in Electronic Health Records. (arXiv:2308.13104v2 [cs.LG] UPDATED)
    Survival analysis plays a crucial role in many healthcare decisions, where the risk prediction for the events of interest can support an informative outlook for a patient's medical journey. Given the existence of data censoring, an effective way of survival analysis is to enforce the pairwise temporal concordance between censored and observed data, aiming to utilize the time interval before censoring as partially observed time-to-event labels for supervised learning. Although existing studies mostly employed ranking methods to pursue an ordering objective, contrastive methods which learn a discriminative embedding by having data contrast against each other, have not been explored thoroughly for survival analysis. Therefore, in this paper, we propose a novel Ontology-aware Temporality-based Contrastive Survival (OTCSurv) analysis framework that utilizes survival durations from both censored and observed data to define temporal distinctiveness and construct negative sample pairs with adjustable hardness for contrastive learning. Specifically, we first use an ontological encoder and a sequential self-attention encoder to represent the longitudinal EHR data with rich contexts. Second, we design a temporal contrastive loss to capture varying survival durations in a supervised setting through a hardness-aware negative sampling mechanism. Last, we incorporate the contrastive task into the time-to-event predictive task with multiple loss components. We conduct extensive experiments using a large EHR dataset to forecast the risk of hospitalized patients who are in danger of developing acute kidney injury (AKI), a critical and urgent medical condition. The effectiveness and explainability of the proposed model are validated through comprehensive quantitative and qualitative studies.
    Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation. (arXiv:2309.16429v1 [cs.LG])
    We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model. As such, it also enables video generation conditioned on text, audio, and, for the first time as far as we can ascertain, on both text and audio. We validate our method extensively on three datasets demonstrating significant semantic diversity of audio-video samples and further propose a novel evaluation metric (AV-Align) to assess the alignment of generated videos with input audio samples. AV-Align is based on the detection and comparison of energy peaks in both modalities. In comparison to recent state-of-the-art approaches, our method generates videos that are better aligned with the input sound, both with respect to content and temporal axis. We also show that videos produced by our method present higher visual quality and are more diverse.
    Graph-level Representation Learning with Joint-Embedding Predictive Architectures. (arXiv:2309.16014v1 [cs.LG])
    Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal $y$ from a context signal $x$. JEPAs bypass the need for data augmentation and negative samples, which are typically required by contrastive learning, while avoiding the overfitting issues associated with generative-based pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm and propose Graph-JEPA, the first JEPA for the graph domain. In particular, we employ masked modeling to learn embeddings for different subgraphs of the input graph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative training objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems.
    Causal Policy Gradient for Whole-Body Mobile Manipulation. (arXiv:2305.04866v4 [cs.RO] UPDATED)
    Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objective nature of the task, e.g., efficiently reaching a goal while avoiding obstacles. Current approaches often segregate tasks into navigation without manipulation and stationary manipulation without locomotion by manually matching parts of the action space to MoMa sub-objectives (e.g. learning base actions for locomotion objectives and learning arm actions for manipulation). This solution prevents simultaneous combinations of locomotion and interaction degrees of freedom and requires human domain knowledge for both partitioning the action space and matching the action parts to the sub-objectives. In this paper, we introduce Causal MoMa, a new reinforcement learning framework to train policies for typical MoMa tasks that makes use of the most favorable subspace of the robot's action space to address each sub-objective. Causal MoMa automatically discovers the causal dependencies between actions and terms of the reward function and exploits these dependencies through causal policy gradient that reduces gradient variance compared to previous state-of-the-art reinforcement learning algorithms, improving convergence and results. We evaluate the performance of Causal MoMa on three types of simulated robots across different MoMa tasks and demonstrate success in transferring the policies trained in simulation directly to a real robot, where our agent is able to follow moving goals and react to dynamic obstacles while simultaneously and synergistically controlling the whole-body: base, arm, and head. More information at https://sites.google.com/view/causal-moma.
    Tiny Classifier Circuits: Evolving Accelerators for Tabular Data. (arXiv:2303.00031v2 [cs.AR] UPDATED)
    A typical machine learning (ML) development cycle for edge computing is to maximise the performance during model training and then minimise the memory/area footprint of the trained model for deployment on edge devices targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This paper proposes a methodology for automatically generating predictor circuits for classification of tabular data with comparable prediction performance to conventional ML techniques while using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates and automatically generates a classifier circuit with maximised training prediction accuracy. Classifier circuits are so tiny (i.e., consisting of no more than 300 logic gates) that they are called "Tiny Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA. We empirically evaluate the automatic Tiny Classifier circuit generation methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and compare it against conventional ML techniques such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-18x less area and 4-8x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.
    General In-Hand Object Rotation with Vision and Touch. (arXiv:2309.09979v2 [cs.RO] UPDATED)
    We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.
    Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset. (arXiv:2305.10775v3 [eess.AS] UPDATED)
    Accurate analysis of speech articulation is crucial for speech analysis. However, X-Y coordinates of articulators strongly depend on the anatomy of the speakers and the variability of pellet placements, and existing methods for mapping anatomical landmarks in the X-ray Microbeam Dataset (XRMB) fail to capture the entire anatomy of the vocal tract. In this paper, we propose a new geometric transformation that improves the accuracy of these measurements. Our transformation maps anatomical landmarks' X-Y coordinates along the midsagittal plane onto six relative measures: Lip Aperture (LA), Lip Protusion (LP), Tongue Body Constriction Location (TTCL), Degree (TBCD), Tongue Tip Constriction Location (TTCL) and Degree (TTCD). Our novel contribution is the extension of the palate trace towards the inferred anterior pharyngeal line, which improves measurements of tongue body constriction.
    Efficiency Separation between RL Methods: Model-Free, Model-Based and Goal-Conditioned. (arXiv:2309.16291v1 [cs.LG])
    We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.  ( 2 min )
    Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics. (arXiv:2309.16109v1 [cs.LG])
    Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
    Multi-Swap $k$-Means++. (arXiv:2309.16384v1 [cs.CG])
    The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.
    A parsimonious, computationally efficient machine learning method for spatial regression. (arXiv:2309.16448v1 [stat.ML])
    We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.
    A Metaheuristic for Amortized Search in High-Dimensional Parameter Spaces. (arXiv:2309.16465v1 [q-bio.QM])
    Parameter inference for dynamical models of (bio)physical systems remains a challenging problem. Intractable gradients, high-dimensional spaces, and non-linear model functions are typically problematic without large computational budgets. A recent body of work in that area has focused on Bayesian inference methods, which consider parameters under their statistical distributions and therefore, do not derive point estimates of optimal parameter values. Here we propose a new metaheuristic that drives dimensionality reductions from feature-informed transformations (DR-FFIT) to address these bottlenecks. DR-FFIT implements an efficient sampling strategy that facilitates a gradient-free parameter search in high-dimensional spaces. We use artificial neural networks to obtain differentiable proxies for the model's features of interest. The resulting gradients enable the estimation of a local active subspace of the model within a defined sampling region. This approach enables efficient dimensionality reductions of highly non-linear search spaces at a low computational cost. Our test data show that DR-FFIT boosts the performances of random-search and simulated-annealing against well-established metaheuristics, and improves the goodness-of-fit of the model, all within contained run-time costs.
    Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation. (arXiv:2309.15938v1 [eess.AS])
    In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.
    Learning to Transform for Generalizable Instance-wise Invariance. (arXiv:2309.16672v1 [cs.CV])
    Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.
    TraCE: Trajectory Counterfactual Explanation Scores. (arXiv:2309.15965v1 [cs.LG])
    Counterfactual explanations, and their associated algorithmic recourse, are typically leveraged to understand, explain, and potentially alter a prediction coming from a black-box classifier. In this paper, we propose to extend the use of counterfactuals to evaluate progress in sequential decision making tasks. To this end, we introduce a model-agnostic modular framework, TraCE (Trajectory Counterfactual Explanation) scores, which is able to distill and condense progress in highly complex scenarios into a single value. We demonstrate TraCE's utility across domains by showcasing its main properties in two case studies spanning healthcare and climate change.
    AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model. (arXiv:2309.16058v1 [cs.LG])
    We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.
    Uncertainty-Aware Decision Transformer for Stochastic Driving Environments. (arXiv:2309.16397v1 [cs.LG])
    Offline Reinforcement Learning (RL) has emerged as a promising framework for learning policies without active interactions, making it especially appealing for autonomous driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which performs well in long-horizon tasks. However, they are overly optimistic in stochastic environments with incorrect assumptions that the same goal can be consistently achieved by identical actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates state uncertainties by the conditional mutual information between transitions and returns, and segments sequences accordingly. Discovering the `uncertainty accumulation' and `temporal locality' properties of driving environments, UNREST replaces the global returns in decision transformers with less uncertain truncated returns, to learn from true outcomes of agent actions rather than environment transitions. We also dynamically evaluate environmental uncertainty during inference for cautious planning. Extensive experimental results demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.
    Axiomatic Aggregations of Abductive Explanations. (arXiv:2109.03890v5 [cs.LG] UPDATED)
    The recent criticisms of the robustness of post hoc model approximation explanation methods (like LIME and SHAP) have led to the rise of model-precise abductive explanations. For each data point, abductive explanations provide a minimal subset of features that are sufficient to generate the outcome. While theoretically sound and rigorous, abductive explanations suffer from a major issue -- there can be several valid abductive explanations for the same data point. In such cases, providing a single abductive explanation can be insufficient; on the other hand, providing all valid abductive explanations can be incomprehensible due to their size. In this work, we solve this issue by aggregating the many possible abductive explanations into feature importance scores. We propose three aggregation methods: two based on power indices from cooperative game theory and a third based on a well-known measure of causal strength. We characterize these three methods axiomatically, showing that each of them uniquely satisfies a set of desirable properties. We also evaluate them on multiple datasets and show that these explanations are robust to the attacks that fool SHAP and LIME.
    Learning Dissipative Neural Dynamical Systems. (arXiv:2309.16032v1 [cs.LG])
    Consider an unknown nonlinear dynamical system that is known to be dissipative. The objective of this paper is to learn a neural dynamical model that approximates this system, while preserving the dissipativity property in the model. In general, imposing dissipativity constraints during neural network training is a hard problem for which no known techniques exist. In this work, we address the problem of learning a dissipative neural dynamical system model in two stages. First, we learn an unconstrained neural dynamical model that closely approximates the system dynamics. Next, we derive sufficient conditions to perturb the weights of the neural dynamical model to ensure dissipativity, followed by perturbation of the biases to retain the fit of the model to the trajectories of the nonlinear system. We show that these two perturbation problems can be solved independently to obtain a neural dynamical model that is guaranteed to be dissipative while closely approximating the nonlinear system.
    Augmenting LLMs with Knowledge: A survey on hallucination prevention. (arXiv:2309.16459v1 [cs.CL])
    Large pre-trained language models have demonstrated their proficiency in storing factual knowledge within their parameters and achieving remarkable results when fine-tuned for downstream natural language processing tasks. Nonetheless, their capacity to access and manipulate knowledge with precision remains constrained, resulting in performance disparities on knowledge-intensive tasks when compared to task-specific architectures. Additionally, the challenges of providing provenance for model decisions and maintaining up-to-date world knowledge persist as open research frontiers. To address these limitations, the integration of pre-trained models with differentiable access mechanisms to explicit non-parametric memory emerges as a promising solution. This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources, including external knowledge bases and search engines. While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules to augment their contextual processing capabilities, departing from the conventional language modeling paradigm. Through an exploration of current advancements in augmenting large language models with knowledge, this work concludes that this emerging research direction holds the potential to address prevalent issues in traditional LMs, such as hallucinations, un-grounded responses, and scalability challenges.
    A Design Toolbox for the Development of Collaborative Distributed Machine Learning Systems. (arXiv:2309.16584v1 [cs.MA])
    To leverage training data for the sufficient training of ML models from multiple parties in a confidentiality-preserving way, various collaborative distributed machine learning (CDML) system designs have been developed, for example, to perform assisted learning, federated learning, and split learning. CDML system designs show different traits, for example, high agent autonomy, machine learning (ML) model confidentiality, and fault tolerance. Facing a wide variety of CDML system designs with different traits, it is difficult for developers to design CDML systems with traits that match use case requirements in a targeted way. However, inappropriate CDML system designs may result in CDML systems failing their envisioned purposes. We developed a CDML design toolbox that can guide the development of CDML systems. Based on the CDML design toolbox, we present CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements.
    Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling. (arXiv:2309.16139v1 [cs.CV])
    Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less explored in instance segmentation compared to other tasks like image classification, which require less labeling. In this study, we propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets. Its practical application is demonstrated on a real-world overhead imagery dataset, where it increases the labeling efficiency fivefold.
    Robust Offline Reinforcement Learning -- Certify the Confidence Interval. (arXiv:2309.16631v1 [cs.LG])
    Currently, reinforcement learning (RL), especially deep RL, has received more and more attention in the research area. However, the security of RL has been an obvious problem due to the attack manners becoming mature. In order to defend against such adversarial attacks, several practical approaches are developed, such as adversarial training, data filtering, etc. However, these methods are mostly based on empirical algorithms and experiments, without rigorous theoretical analysis of the robustness of the algorithms. In this paper, we develop an algorithm to certify the robustness of a given policy offline with random smoothing, which could be proven and conducted as efficiently as ones without random smoothing. Experiments on different environments confirm the correctness of our algorithm.
    Infinite Neural Network Quantum States: Entanglement and Training Dynamics. (arXiv:2112.00723v2 [quant-ph] UPDATED)
    We study infinite limits of neural network quantum states ($\infty$-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of Renyi entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. A general framework is developed for studying the gradient descent dynamics of neural network quantum states (NNQS), using a quantum state neural tangent kernel (QS-NTK). For $\infty$-NNQS the training dynamics is simplified, since the QS-NTK becomes deterministic and constant. An analytic solution is derived for quantum state supervised learning, which allows an $\infty$-NNQS to recover any target wavefunction. Numerical experiments on finite and infinite NNQS in the transverse field Ising model and Fermi Hubbard model demonstrate excellent agreement with theory. $\infty$-NNQS opens up new opportunities for studying entanglement and training dynamics in other physics applications, such as in finding ground states.
    Navigating Healthcare Insights: A Birds Eye View of Explainability with Knowledge Graphs. (arXiv:2309.16593v1 [cs.AI])
    Knowledge graphs (KGs) are gaining prominence in Healthcare AI, especially in drug discovery and pharmaceutical research as they provide a structured way to integrate diverse information sources, enhancing AI system interpretability. This interpretability is crucial in healthcare, where trust and transparency matter, and eXplainable AI (XAI) supports decision making for healthcare professionals. This overview summarizes recent literature on the impact of KGs in healthcare and their role in developing explainable AI models. We cover KG workflow, including construction, relationship extraction, reasoning, and their applications in areas like Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. We emphasize the importance of making KGs more interpretable through knowledge-infused learning in healthcare. Finally, we highlight research challenges and provide insights for future directions.
    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit. (arXiv:2309.16620v1 [stat.ML])
    The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    AtomSurf : Surface Representation for Learning on Protein Structures. (arXiv:2309.16519v1 [cs.LG])
    Recent advancements in Cryo-EM and protein structure prediction algorithms have made large-scale protein structures accessible, paving the way for machine learning-based functional annotations.The field of geometric deep learning focuses on creating methods working on geometric data. An essential aspect of learning from protein structures is representing these structures as a geometric object (be it a grid, graph, or surface) and applying a learning method tailored to this representation. The performance of a given approach will then depend on both the representation and its corresponding learning method. In this paper, we investigate representing proteins as $\textit{3D mesh surfaces}$ and incorporate them into an established representation benchmark. Our first finding is that despite promising preliminary results, the surface representation alone does not seem competitive with 3D grids. Building on this, we introduce a synergistic approach, combining surface representations with graph-based methods, resulting in a general framework that incorporates both representations in learning. We show that using this combination, we are able to obtain state-of-the-art results across $\textit{all tested tasks}$. Our code and data can be found online: https://github.com/Vincentx15/atom2D .
    Nonlinear MPC design for incrementally ISS systems with application to GRU networks. (arXiv:2309.16428v1 [eess.SY])
    This brief addresses the design of a Nonlinear Model Predictive Control (NMPC) strategy for exponentially incremental Input-to-State Stable (ISS) systems. In particular, a novel formulation is devised, which does not necessitate the onerous computation of terminal ingredients, but rather relies on the explicit definition of a minimum prediction horizon ensuring closed-loop stability. The designed methodology is particularly suited for the control of systems learned by Recurrent Neural Networks (RNNs), which are known for their enhanced modeling capabilities and for which the incremental ISS properties can be studied thanks to simple algebraic conditions. The approach is applied to Gated Recurrent Unit (GRU) networks, providing also a method for the design of a tailored state observer with convergence guarantees. The resulting control architecture is tested on a benchmark system, demonstrating its good control performances and efficient applicability.
    Uncertainty Quantification for Eosinophil Segmentation. (arXiv:2309.16536v1 [eess.IV])
    Eosinophilic Esophagitis (EoE) is an allergic condition increasing in prevalence. To diagnose EoE, pathologists must find 15 or more eosinophils within a single high-power field (400X magnification). Determining whether or not a patient has EoE can be an arduous process and any medical imaging approaches used to assist diagnosis must consider both efficiency and precision. We propose an improvement of Adorno et al's approach for quantifying eosinphils using deep image segmentation. Our new approach leverages Monte Carlo Dropout, a common approach in deep learning to reduce overfitting, to provide uncertainty quantification on current deep learning models. The uncertainty can be visualized in an output image to evaluate model performance, provide insight to how deep learning algorithms function, and assist pathologists in identifying eosinophils.
    Systematic Sampling and Validation of Machine Learning-Parameterizations in Climate Models. (arXiv:2309.16177v1 [physics.ao-ph])
    Progress in hybrid physics-machine learning (ML) climate simulations has been limited by the difficulty of obtaining performant coupled (i.e. online) simulations. While evaluating hundreds of ML parameterizations of subgrid closures (here of convection and radiation) offline is straightforward, online evaluation at the same scale is technically challenging. Our software automation achieves an order-of-magnitude larger sampling of online modeling errors than has previously been examined. Using this, we evaluate the hybrid climate model performance and define strategies to improve it. We show that model online performance improves when incorporating memory, a relative humidity input feature transformation, and additional input variables. We also reveal substantial variation in online error and inconsistencies between offline vs. online error statistics. The implication is that hundreds of candidate ML models should be evaluated online to detect the effects of parameterization design choices. This is considerably more sampling than tends to be reported in the current literature.
    Deep Single Models vs. Ensembles: Insights for a Fast Deployment of Parking Monitoring Systems. (arXiv:2309.16495v1 [cs.CV])
    Searching for available parking spots in high-density urban centers is a stressful task for drivers that can be mitigated by systems that know in advance the nearest parking space available. To this end, image-based systems offer cost advantages over other sensor-based alternatives (e.g., ultrasonic sensors), requiring less physical infrastructure for installation and maintenance. Despite recent deep learning advances, deploying intelligent parking monitoring is still a challenge since most approaches involve collecting and labeling large amounts of data, which is laborious and time-consuming. Our study aims to uncover the challenges in creating a global framework, trained using publicly available labeled parking lot images, that performs accurately across diverse scenarios, enabling the parking space monitoring as a ready-to-use system to deploy in a new environment. Through exhaustive experiments involving different datasets and deep learning architectures, including fusion strategies and ensemble methods, we found that models trained on diverse datasets can achieve 95\% accuracy without the burden of data annotation and model training on the target parking lot
    Dynamic Selection in Algorithmic Decision-making. (arXiv:2108.12547v3 [econ.EM] UPDATED)
    This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to correct for the bias. It obtains true parameter values and attains low (logarithmic-like) regret levels. We also prove a central limit theorem for statistical inference. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
    Voting Network for Contour Levee Farmland Segmentation and Classification. (arXiv:2309.16561v1 [cs.CV])
    High-resolution aerial imagery allows fine details in the segmentation of farmlands. However, small objects and features introduce distortions to the delineation of object boundaries, and larger contextual views are needed to mitigate class confusion. In this work, we present an end-to-end trainable network for segmenting farmlands with contour levees from high-resolution aerial imagery. A fusion block is devised that includes multiple voting blocks to achieve image segmentation and classification. We integrate the fusion block with a backbone and produce both semantic predictions and segmentation slices. The segmentation slices are used to perform majority voting on the predictions. The network is trained to assign the most likely class label of a segment to its pixels, learning the concept of farmlands rather than analyzing constitutive pixels separately. We evaluate our method using images from the National Agriculture Imagery Program. Our method achieved an average accuracy of 94.34\%. Compared to the state-of-the-art methods, the proposed method obtains an improvement of 6.96% and 2.63% in the F1 score on average.
    Selective Nonparametric Regression via Testing. (arXiv:2309.16412v1 [stat.ML])
    Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
    Jointly Training Large Autoregressive Multimodal Models. (arXiv:2309.15564v2 [cs.LG] UPDATED)
    In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models. We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixed-modal generation tasks. Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs and represents the first model explicitly designed for this purpose.
    CasIL: Cognizing and Imitating Skills via a Dual Cognition-Action Architecture. (arXiv:2309.16299v1 [cs.RO])
    Enabling robots to effectively imitate expert skills in longhorizon tasks such as locomotion, manipulation, and more, poses a long-standing challenge. Existing imitation learning (IL) approaches for robots still grapple with sub-optimal performance in complex tasks. In this paper, we consider how this challenge can be addressed within the human cognitive priors. Heuristically, we extend the usual notion of action to a dual Cognition (high-level)-Action (low-level) architecture by introducing intuitive human cognitive priors, and propose a novel skill IL framework through human-robot interaction, called Cognition-Action-based Skill Imitation Learning (CasIL), for the robotic agent to effectively cognize and imitate the critical skills from raw visual demonstrations. CasIL enables both cognition and action imitation, while high-level skill cognition explicitly guides low-level primitive actions, providing robustness and reliability to the entire skill IL process. We evaluated our method on MuJoCo and RLBench benchmarks, as well as on the obstacle avoidance and point-goal navigation tasks for quadrupedal robot locomotion. Experimental results show that our CasIL consistently achieves competitive and robust skill imitation capability compared to other counterparts in a variety of long-horizon robotic tasks.
    Method and Validation for Optimal Lineup Creation for Daily Fantasy Football Using Machine Learning and Linear Programming. (arXiv:2309.15253v2 [cs.LG] UPDATED)
    Daily fantasy sports (DFS) are weekly or daily online contests where real-game performances of individual players are converted to fantasy points (FPTS). Users select players for their lineup to maximize their FPTS within a set player salary cap. This paper focuses on (1) the development of a method to forecast NFL player performance under uncertainty and (2) determining an optimal lineup to maximize FPTS under a set salary limit. A supervised learning neural network was created and used to project FPTS based on past player performance (2018 NFL regular season for this work) prior to the upcoming week. These projected FPTS were used in a mixed integer linear program to find the optimal lineup. The performance of resultant lineups was compared to randomly-created lineups. On average, the optimal lineups outperformed the random lineups. The generated lineups were then compared to real-world lineups from users on DraftKings. The generated lineups generally fell in approximately the 31st percentile (median). The FPTS methods and predictions presented here can be further improved using this study as a baseline comparison.
    Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs. (arXiv:2309.16357v1 [cs.LG])
    Most knowledge graph completion (KGC) methods learn latent representations of entities and relations of a given graph by mapping them into a vector space. Although the majority of these methods focus on static knowledge graphs, a large number of publicly available KGs contain temporal information stating the time instant/period over which a certain fact has been true. Such graphs are often known as temporal knowledge graphs. Furthermore, knowledge graphs may also contain textual descriptions of entities and relations. Both temporal information and textual descriptions are not taken into account during representation learning by static KGC methods, and only structural information of the graph is leveraged. Recently, some studies have used temporal information to improve link prediction, yet they do not exploit textual descriptions and do not support inductive inference (prediction on entities that have not been seen in training). We propose a novel framework called TEMT that exploits the power of pre-trained language models (PLMs) for text-enhanced temporal knowledge graph completion. The knowledge stored in the parameters of a PLM allows TEMT to produce rich semantic representations of facts and to generalize on previously unseen entities. TEMT leverages textual and temporal information available in a KG, treats them separately, and fuses them to get plausibility scores of facts. Unlike previous approaches, TEMT effectively captures dependencies across different time points and enables predictions on unseen entities. To assess the performance of TEMT, we carried out several experiments including time interval prediction, both in transductive and inductive settings, and triple classification. The experimental results show that TEMT is competitive with the state-of-the-art.
    HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes. (arXiv:2212.10538v2 [cs.LG] UPDATED)
    Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
    Quantum Self-Attention Neural Networks for Text Classification. (arXiv:2205.05625v2 [quant-ph] UPDATED)
    An emerging direction of quantum computing is to establish meaningful quantum applications in various fields of artificial intelligence, including natural language processing (NLP). Although some efforts based on syntactic analysis have opened the door to research in Quantum NLP (QNLP), limitations such as heavy syntactic preprocessing and syntax-dependent network architecture make them impracticable on larger and real-world data sets. In this paper, we propose a new simple network architecture, called the quantum self-attention neural network (QSANN), which can compensate for these limitations. Specifically, we introduce the self-attention mechanism into quantum neural networks and then utilize a Gaussian projected quantum self-attention serving as a sensible quantum version of self-attention. As a result, QSANN is effective and scalable on larger data sets and has the desirable property of being implementable on near-term quantum devices. In particular, our QSANN outperforms the best existing QNLP model based on syntactic analysis as well as a simple classical self-attention neural network in numerical experiments of text classification tasks on public data sets. We further show that our method exhibits robustness to low-level quantum noises and showcases resilience to quantum neural network architectures.
    Augment to Interpret: Unsupervised and Inherently Interpretable Graph Embeddings. (arXiv:2309.16564v1 [cs.LG])
    Unsupervised learning allows us to leverage unlabelled data, which has become abundantly available, and to create embeddings that are usable on a variety of downstream tasks. However, the typical lack of interpretability of unsupervised representation learning has become a limiting factor with regard to recent transparent-AI regulations. In this paper, we study graph representation learning and we show that data augmentation that preserves semantics can be learned and used to produce interpretations. Our framework, which we named INGENIOUS, creates inherently interpretable embeddings and eliminates the need for costly additional post-hoc analysis. We also introduce additional metrics addressing the lack of formalism and metrics in the understudied area of unsupervised-representation learning interpretability. Our results are supported by an experimental study applied to both graph-level and node-level tasks and show that interpretable embeddings provide state-of-the-art performance on subsequent downstream tasks.
    Unsupervised Fact Verification by Language Model Distillation. (arXiv:2309.16540v1 [cs.CL])
    Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.
    Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures. (arXiv:2309.16645v1 [cs.LG])
    In, Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We verified the reproducibility of the study conducted by Elmarakeby et al., using both their original codebase, and our own re-implementation using more up-to-date libraries. We quantified the contribution of network sparsification by Reactome biological pathways, and confirmed its importance to P-NET's superior performance. Furthermore, we explored alternative neural architectures and approaches to incorporating biological information into the networks. We experimented with three types of graph neural networks on the same training data, and investigated the clinical prediction agreement between different models. Our analyses demonstrated that deep neural networks with distinct architectures make incorrect predictions for individual patient that are persistent across different initializations of a specific neural architecture. This suggests that different neural architectures are sensitive to different aspects of the data, an important yet under-explored challenge for clinical prediction tasks.
    Using Weak Supervision and Data Augmentation in Question Answering. (arXiv:2309.16175v1 [cs.CL])
    The onset of the COVID-19 pandemic accentuated the need for access to biomedical literature to answer timely and disease-specific questions. During the early days of the pandemic, one of the biggest challenges we faced was the lack of peer-reviewed biomedical articles on COVID-19 that could be used to train machine learning models for question answering (QA). In this paper, we explore the roles weak supervision and data augmentation play in training deep neural network QA models. First, we investigate whether labels generated automatically from the structured abstracts of scholarly papers using an information retrieval algorithm, BM25, provide a weak supervision signal to train an extractive QA model. We also curate new QA pairs using information retrieval techniques, guided by the clinicaltrials.gov schema and the structured abstracts of articles, in the absence of annotated data from biomedical domain experts. Furthermore, we explore augmenting the training data of a deep neural network model with linguistic features from external sources such as lexical databases to account for variations in word morphology and meaning. To better utilize our training data, we apply curriculum learning to domain adaptation, fine-tuning our QA model in stages based on characteristics of the QA pairs. We evaluate our methods in the context of QA models at the core of a system to answer questions about COVID-19.
    End-to-end Risk Prediction of Atrial Fibrillation from the 12-Lead ECG by Deep Neural Networks. (arXiv:2309.16335v1 [cs.LG])
    Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and evaluate one such algorithm on a large CODE dataset collected in Brazil. Results: The deep neural network model identified patients without indication of AF in the presented ECG but who will develop AF in the future with an AUC score of 0.845. From our survival model, we obtain that patients in the high-risk group (i.e. with the probability of a future AF case being greater than 0.7) are 50% more likely to develop AF within 40 weeks, while patients belonging to the minimal-risk group (i.e. with the probability of a future AF case being less than or equal to 0.1) have more than 85% chance of remaining AF free up until after seven years. Conclusion: We developed and validated a model for AF risk prediction. If applied in clinical practice, the model possesses the potential of providing valuable and useful information in decision-making and patient management processes.
    Universal Sleep Decoder: Aligning awake and sleep neural representation across subjects. (arXiv:2309.16457v1 [cs.LG])
    Decoding memory content from brain activity during sleep has long been a goal in neuroscience. While spontaneous reactivation of memories during sleep in rodents is known to support memory consolidation and offline learning, capturing memory replay in humans is challenging due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 52 subjects during both wakefulness and sleep. Leveraging this benchmark dataset, we developed the Universal Sleep Decoder (USD) to align neural representations between wakefulness and sleep across subjects. Our model achieves up to 16.6% top-1 zero-shot accuracy on unseen subjects, comparable to decoding performances using individual sleep data. Furthermore, fine-tuning USD on test subjects enhances decoding accuracy to 25.9% top-1 accuracy, a substantial improvement over the baseline chance of 6.7%. Model comparison and ablation analyses reveal that our design choices, including the use of (i) an additional contrastive objective to integrate awake and sleep neural signals and (ii) the pretrain-finetune paradigm to incorporate different subjects, significantly contribute to these performances. Collectively, our findings and methodologies represent a significant advancement in the field of sleep decoding.
    Comparing Active Learning Performance Driven by Gaussian Processes or Bayesian Neural Networks for Constrained Trajectory Exploration. (arXiv:2309.16114v1 [cs.RO])
    Robots with increasing autonomy progress our space exploration capabilities, particularly for in-situ exploration and sampling to stand in for human explorers. Currently, humans drive robots to meet scientific objectives, but depending on the robot's location, the exchange of information and driving commands between the human operator and robot may cause undue delays in mission fulfillment. An autonomous robot encoded with a scientific objective and an exploration strategy incurs no communication delays and can fulfill missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure varies the performance of the active learning algorithm in accurately forming an understanding of the environment. In this paper, we investigate the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents that are constrained in their trajectories, like planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments in which Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.
    Review of Machine Learning Methods for Additive Manufacturing of Functionally Graded Materials. (arXiv:2309.16571v1 [cs.LG])
    Additive manufacturing has revolutionized the manufacturing of complex parts by enabling direct material joining and offers several advantages such as cost-effective manufacturing of complex parts, reducing manufacturing waste, and opening new possibilities for manufacturing automation. One group of materials for which additive manufacturing holds great potential for enhancing component performance and properties is Functionally Graded Materials (FGMs). FGMs are advanced composite materials that exhibit smoothly varying properties making them desirable for applications in aerospace, automobile, biomedical, and defense industries. Such composition differs from traditional composite materials, since the location-dependent composition changes gradually in FGMs, leading to enhanced properties. Recently, machine learning techniques have emerged as a promising means for fabrication of FGMs through optimizing processing parameters, improving product quality, and detecting manufacturing defects. This paper first provides a brief literature review of works related to FGM fabrication, followed by reviewing works on employing machine learning in additive manufacturing, Afterward, we provide an overview of published works in the literature related to the application of machine learning methods in Directed Energy Deposition and for fabrication of FGMs.
    EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect. (arXiv:2309.16338v1 [cs.LG])
    Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accuracy and higher bias on their local data. According to the Matthew effect, which describes how the advantaged gain more advantage and the disadvantaged lose more over time, deploying such a global model in client applications may worsen the resource disparity among the clients and harm the principles of social welfare and fairness. To mitigate the Matthew effect, we propose Egalitarian Fairness Federated Learning (EFFL), where egalitarian fairness refers to the global model learned from FL has: (1) equal accuracy among clients; (2) equal decision bias among clients. Besides achieving egalitarian fairness among the clients, EFFL also aims for performance optimality, minimizing the empirical risk loss and the bias for each client; both are essential for any ML model training, whether centralized or decentralized. We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. We propose a gradient-based three-stage algorithm to obtain the Pareto optimal solutions within the constraint space. Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.
    Task-Oriented Koopman-Based Control with Contrastive Encoder. (arXiv:2309.16077v1 [cs.RO])
    We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator and associated linear controller within an iterative loop. By prioritizing the task cost as main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which extends Koopman control beyond low-dimensional systems to high-dimensional, complex nonlinear systems, including pixel-based scenarios.
    Unmasking the Chameleons: A Benchmark for Out-of-Distribution Detection in Medical Tabular Data. (arXiv:2309.16220v1 [cs.LG])
    Despite their success, Machine Learning (ML) models do not generalize effectively to data not originating from the training distribution. To reliably employ ML models in real-world healthcare systems and avoid inaccurate predictions on out-of-distribution (OOD) data, it is crucial to detect OOD samples. Numerous OOD detection approaches have been suggested in other fields - especially in computer vision - but it remains unclear whether the challenge is resolved when dealing with medical tabular data. To answer this pressing need, we propose an extensive reproducible benchmark to compare different methods across a suite of tests including both near and far OODs. Our benchmark leverages the latest versions of eICU and MIMIC-IV, two public datasets encompassing tens of thousands of ICU patients in several hospitals. We consider a wide array of density-based methods and SOTA post-hoc detectors across diverse predictive architectures, including MLP, ResNet, and Transformer. Our findings show that i) the problem appears to be solved for far-OODs, but remains open for near-OODs; ii) post-hoc methods alone perform poorly, but improve substantially when coupled with distance-based mechanisms; iii) the transformer architecture is far less overconfident compared to MLP and ResNet.
    Compositional Sculpting of Iterative Generative Processes. (arXiv:2309.16115v1 [cs.LG])
    High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.  ( 2 min )
    Differential 2D Copula Approximating Transforms via Sobolev Training: 2-Cats Networks. (arXiv:2309.16391v1 [cs.LG])
    Copulas are a powerful statistical tool that captures dependencies across data dimensions. When applying Copulas, we can estimate multivariate distribution functions by initially estimating independent marginals, an easy task, and then a single copulating function, $C$, to connect the marginals, a hard task. For two-dimensional data, a copula is a two-increasing function of the form $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$, where $\mathbf{I} = [0, 1]$. In this paper, we show how Neural Networks (NNs) can approximate any two-dimensional copula non-parametrically. Our approach, denoted as 2-Cats, is inspired by the Physics-Informed Neural Networks and Sobolev Training literature. Not only do we show that we can estimate the output of a 2d Copula better than the state-of-the-art, our approach is non-parametric and respects the mathematical properties of a Copula $C$.
    LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite. (arXiv:2309.16342v1 [cs.LG])
    Machine learning has been successfully applied to grid-based PDE modeling in various scientific applications. However, learned PDE solvers based on Lagrangian particle discretizations, which are the preferred approach to problems with free surfaces or complex physics, remain largely unexplored. We present LagrangeBench, the first benchmarking suite for Lagrangian particle problems, focusing on temporal coarse-graining. In particular, our contribution is: (a) seven new fluid mechanics datasets (four in 2D and three in 3D) generated with the Smoothed Particle Hydrodynamics (SPH) method including the Taylor-Green vortex, lid-driven cavity, reverse Poiseuille flow, and dam break, each of which includes different physics like solid wall interactions or free surface, (b) efficient JAX-based API with various recent training strategies and neighbors search routine, and (c) JAX implementation of established Graph Neural Networks (GNNs) like GNS and SEGNN with baseline results. Finally, to measure the performance of learned surrogates we go beyond established position errors and introduce physical metrics like kinetic energy MSE and Sinkhorn distance for the particle distribution. Our codebase is available under the URL: https://github.com/tumaer/lagrangebench
    Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces. (arXiv:2309.16597v1 [cs.LG])
    Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.
    Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints. (arXiv:2309.16240v1 [cs.LG])
    The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey. (arXiv:2309.16398v1 [cs.LG])
    Differential Privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state-of-the-art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: auditing and evaluation methods for private models, improvements of privacy-utility trade-offs, protection against a broad range of threats and attacks, differentially private generative models, and emerging application domains.
    ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging. (arXiv:2309.16353v1 [cs.LG])
    Time series data can be found in almost every domain, ranging from the medical field to manufacturing and wireless communication. Generating realistic and useful exemplars and prototypes is a fundamental data analysis task. In this paper, we investigate a novel approach to generating realistic and useful exemplars and prototypes for time series data. Our approach uses a new form of time series average, the ShapeDTW Barycentric Average. We therefore turn our attention to accurately generating time series prototypes with a novel approach. The existing time series prototyping approaches rely on the Dynamic Time Warping (DTW) similarity measure such as DTW Barycentering Average (DBA) and SoftDBA. These last approaches suffer from a common problem of generating out-of-distribution artifacts in their prototypes. This is mostly caused by the DTW variant used and its incapability of detecting neighborhood similarities, instead it detects absolute similarities. Our proposed method, ShapeDBA, uses the ShapeDTW variant of DTW, that overcomes this issue. We chose time series clustering, a popular form of time series analysis to evaluate the outcome of ShapeDBA compared to the other prototyping approaches. Coupled with the k-means clustering algorithm, and evaluated on a total of 123 datasets from the UCR archive, our proposed averaging approach is able to achieve new state-of-the-art results in terms of Adjusted Rand Index.
    A Primer on Bayesian Neural Networks: Review and Debates. (arXiv:2309.16314v1 [stat.ML])
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
    A framework for paired-sample hypothesis testing for high-dimensional data. (arXiv:2309.16274v1 [stat.ML])
    The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.
    RealFill: Reference-Driven Generation for Authentic Image Completion. (arXiv:2309.16668v1 [cs.CV])
    Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. See more results on our project page: https://realfill.github.io
    GInX-Eval: Towards In-Distribution Evaluation of Graph Neural Network Explanations. (arXiv:2309.16223v1 [cs.AI])
    Diverse explainability methods of graph neural networks (GNN) have recently been developed to highlight the edges and nodes in the graph that contribute the most to the model predictions. However, it is not clear yet how to evaluate the correctness of those explanations, whether it is from a human or a model perspective. One unaddressed bottleneck in the current evaluation procedure is the problem of out-of-distribution explanations, whose distribution differs from those of the training data. This important issue affects existing evaluation metrics such as the popular faithfulness or fidelity score. In this paper, we show the limitations of faithfulness metrics. We propose GInX-Eval (Graph In-distribution eXplanation Evaluation), an evaluation procedure of graph explanations that overcomes the pitfalls of faithfulness and offers new insights on explainability methods. Using a retraining strategy, the GInX score measures how informative removed edges are for the model and the EdgeRank score evaluates if explanatory edges are correctly ordered by their importance. GInX-Eval verifies if ground-truth explanations are instructive to the GNN model. In addition, it shows that many popular methods, including gradient-based methods, produce explanations that are not better than a random designation of edges as important subgraphs, challenging the findings of current works in the area. Results with GInX-Eval are consistent across multiple datasets and align with human evaluation.
    Predicting Cardiovascular Complications in Post-COVID-19 Patients Using Data-Driven Machine Learning Models. (arXiv:2309.16059v1 [cs.LG])
    The COVID-19 pandemic has globally posed numerous health challenges, notably the emergence of post-COVID-19 cardiovascular complications. This study addresses this by utilizing data-driven machine learning models to predict such complications in 352 post-COVID-19 patients from Iraq. Clinical data, including demographics, comorbidities, lab results, and imaging, were collected and used to construct predictive models. These models, leveraging various machine learning algorithms, demonstrated commendable performance in identifying patients at risk. Early detection through these models promises timely interventions and improved outcomes. In conclusion, this research underscores the potential of data-driven machine learning for predicting post-COVID-19 cardiovascular complications, emphasizing the need for continued validation and research in diverse clinical settings.
    Mixup Your Own Pairs. (arXiv:2309.16633v1 [cs.LG])
    In representation learning, regression has traditionally received less attention than classification. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we advocate "mixup your own contrastive pairs for supervised contrastive regression", instead of relying solely on real/augmented samples. Specifically, we propose Supervised Contrastive Learning for Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through extensive experiments on six regression datasets including 2D images, volumetric images, text, tabular data, and time-series signals, coupled with theoretical analysis, we demonstrate that SupReMix pre-training fosters continuous ordered representations of regression data, resulting in significant improvement in regression performance. Furthermore, SupReMix is superior to other approaches in a range of regression challenges including transfer learning, imbalanced training data, and scenarios with fewer training samples.
    The Trickle-down Impact of Reward (In-)consistency on RLHF. (arXiv:2309.16155v1 [cs.CL])
    Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- and their impact on the downstream RLHF model. In this paper, we visit a series of research questions relevant to RM inconsistency: (1) How can we measure the consistency of reward models? (2) How consistent are the existing RMs and how can we improve them? (3) In what ways does reward inconsistency influence the chatbots resulting from the RLHF model training? We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM is expected to rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans. To show that RM consistency can be improved efficiently without using extra training budget, we propose two techniques ConvexDA and RewardFusion, which enhance reward consistency through extrapolation during the RM training and inference stage, respectively. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process.
    Digital Twin-based Anomaly Detection with Curriculum Learning in Cyber-physical Systems. (arXiv:2309.15995v1 [cs.LG])
    Anomaly detection is critical to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of attacks and CPS themselves, anomaly detection in CPS is becoming more and more challenging. In our previous work, we proposed a digital twin-based anomaly detection method, called ATTAIN, which takes advantage of both historical and real-time data of CPS. However, such data vary significantly in terms of difficulty. Therefore, similar to human learning processes, deep learning models (e.g., ATTAIN) can benefit from an easy-to-difficult curriculum. To this end, in this paper, we present a novel approach, named digitaL twin-based Anomaly deTecTion wIth Curriculum lEarning (LATTICE), which extends ATTAIN by introducing curriculum learning to optimize its learning paradigm. LATTICE attributes each sample with a difficulty score, before being fed into a training scheduler. The training scheduler samples batches of training data based on these difficulty scores such that learning from easy to difficult data can be performed. To evaluate LATTICE, we use five publicly available datasets collected from five real-world CPS testbeds. We compare LATTICE with ATTAIN and two other state-of-the-art anomaly detectors. Evaluation results show that LATTICE outperforms the three baselines and ATTAIN by 0.906%-2.367% in terms of the F1 score. LATTICE also, on average, reduces the training time of ATTAIN by 4.2% on the five datasets and is on par with the baselines in terms of detection delay time.
    Identifying Risk Factors for Post-COVID-19 Mental Health Disorders: A Machine Learning Perspective. (arXiv:2309.16055v1 [cs.LG])
    In this study, we leveraged machine learning techniques to identify risk factors associated with post-COVID-19 mental health disorders. Our analysis, based on data collected from 669 patients across various provinces in Iraq, yielded valuable insights. We found that age, gender, and geographical region of residence were significant demographic factors influencing the likelihood of developing mental health disorders in post-COVID-19 patients. Additionally, comorbidities and the severity of COVID-19 illness were important clinical predictors. Psychosocial factors, such as social support, coping strategies, and perceived stress levels, also played a substantial role. Our findings emphasize the complex interplay of multiple factors in the development of mental health disorders following COVID-19 recovery. Healthcare providers and policymakers should consider these risk factors when designing targeted interventions and support systems for individuals at risk. Machine learning-based approaches can provide a valuable tool for predicting and preventing adverse mental health outcomes in post-COVID-19 patients. Further research and prospective studies are needed to validate these findings and enhance our understanding of the long-term psychological impact of the COVID-19 pandemic. This study contributes to the growing body of knowledge regarding the mental health consequences of the COVID-19 pandemic and underscores the importance of a multidisciplinary approach to address the diverse needs of individuals on the path to recovery. Keywords: COVID-19, mental health, risk factors, machine learning, Iraq
    Learning Interpretable Characteristic Kernels via Decision Forests. (arXiv:1812.00029v3 [stat.ML] UPDATED)
    Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While there has been extensive research on the applications and properties of kernels, there is relatively little research on kernels induced by decision forests. We construct Kernel Mean Embedding Random Forests (KMERF), which induce kernels from random trees and/or forests using leaf-node proximity. We introduce the notion of an asymptotically characteristic kernel, and prove that KMERF kernels are asymptotically characteristic for both discrete and continuous data. Because KMERF is data-adaptive, we suspected it would outperform kernels selected a priori on finite sample data. We illustrate that KMERF nearly dominates current state-of-the-art kernel-based tests across a diverse range of high-dimensional two-sample and independence testing settings. Furthermore, our forest-based approach is interpretable, and provides feature importance metrics that readily distinguish important dimensions, unlike other high-dimensional non-parametric testing procedures. Hence, this work demonstrates the decision forest-based kernel can be more powerful and more interpretable than existing methods, flying in the face of conventional wisdom of the trade-off between the two.
    Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training. (arXiv:2309.15881v1 [cs.LG])
    Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiveness. The scheme, termed the multi-layer embeddings training (MLET), trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. For inference efficiency, MLET converts the trained two-layer embedding into a single-layer one thus keeping inference-time model size unchanged. Empirical superiority of MLET is puzzling as its search space is not larger than that of the single-layer embedding. The strong dependence of MLET on the inner dimension is even more surprising. We develop a theory that explains both of these behaviors by showing that MLET creates an adaptive update mechanism modulated by the singular vectors of embeddings. When tested on multiple state-of-the-art recommendation models for click-through rate (CTR) prediction tasks, MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average, across the models.
    Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks. (arXiv:2309.16347v1 [cs.RO])
    Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons.
    HyperPPO: A scalable method for finding small policies for robotic control. (arXiv:2309.16663v1 [cs.RO])
    Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo
    Identifying factors associated with fast visual field progression in patients with ocular hypertension based on unsupervised machine learning. (arXiv:2309.15867v1 [cs.LG])
    Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dB/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches.
    A novel approach to measuring patent claim scope based on probabilities obtained from (large) language models. (arXiv:2309.10003v2 [cs.CL] UPDATED)
    This work proposes to measure the scope of a patent claim as the reciprocal of the self-information contained in this claim. A probability of occurrence of the claim is obtained from a language model and this probability is used to compute the self-information. Grounded in information theory, this approach is based on the assumption that an unlikely concept is more informative than a usual concept, insofar as it is more surprising. In turn, the more surprising the information required to defined the claim, the narrower its scope. Five language models are considered, ranging from simplest models (each word or character is assigned an identical probability) to intermediate models (using average word or character frequencies), to a large language model (GPT2). Interestingly, the scope resulting from the simplest language models is proportional to the reciprocal of the number of words or characters involved in the claim, a metric already used in previous works. Application is made to multiple series of patent claims directed to distinct inventions, where each series consists of claims devised to have a gradually decreasing scope. The performance of the language models is assessed with respect to several ad hoc tests. The more sophisticated the model, the better the results. I.e., the GPT2 probability model outperforms models based on word and character frequencies, which themselves outdo the simplest models based on word or character counts. Still, the character count appears to be a more reliable indicator than the word count.
    GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis. (arXiv:2309.16022v1 [cs.LG])
    With the ever-growing popularity of Graph Neural Networks (GNNs), efficient GNN inference is gaining tremendous attention. Field-Programming Gate Arrays (FPGAs) are a promising execution platform due to their fine-grained parallelism, low-power consumption, reconfigurability, and concurrent execution. Even better, High-Level Synthesis (HLS) tools bridge the gap between the non-trivial FPGA development efforts and rapid emergence of new GNN models. In this paper, we propose GNNHLS, an open-source framework to comprehensively evaluate GNN inference acceleration on FPGAs via HLS, containing a software stack for data generation and baseline deployment, and FPGA implementations of 6 well-tuned GNN HLS kernels. We evaluate GNNHLS on 4 graph datasets with distinct topologies and scales. The results show that GNNHLS achieves up to 50.8x speedup and 423x energy reduction relative to the CPU baselines. Compared with the GPU baselines, GNNHLS achieves up to 5.16x speedup and 74.5x energy reduction.
    Abdominal multi-organ segmentation in CT using Swinunter. (arXiv:2309.16210v1 [eess.IV])
    Abdominal multi-organ segmentation in computed tomography (CT) is crucial for many clinical applications including disease detection and treatment planning. Deep learning methods have shown unprecedented performance in this perspective. However, it is still quite challenging to accurately segment different organs utilizing a single network due to the vague boundaries of organs, the complex background, and the substantially different organ size scales. In this work we used make transformer-based model for training. It was found through previous years' competitions that basically all of the top 5 methods used CNN-based methods, which is likely due to the lack of data volume that prevents transformer-based methods from taking full advantage. The thousands of samples in this competition may enable the transformer-based model to have more excellent results. The results on the public validation set also show that the transformer-based model can achieve an acceptable result and inference time.
    ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers. (arXiv:2309.16119v1 [cs.LG])
    We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 3-bit LLMs for the first time--leveraging state-of-the-art 3-bit OPTQ quantization often outperforms finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language infernece, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models--including the first family of 3-bit instruction following Alpaca LLMs--as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.
    Advancing Federated Learning in 6G: A Trusted Architecture with Graph-based Analysis. (arXiv:2309.05525v3 [cs.NI] UPDATED)
    Integrating native AI support into the network architecture is an essential objective of 6G. Federated Learning (FL) emerges as a potential paradigm, facilitating decentralized AI model training across a diverse range of devices under the coordination of a central server. However, several challenges hinder its wide application in the 6G context, such as malicious attacks and privacy snooping on local model updates, and centralization pitfalls. This work proposes a trusted architecture for supporting FL, which utilizes Distributed Ledger Technology (DLT) and Graph Neural Network (GNN), including three key features. First, a pre-processing layer employing homomorphic encryption is incorporated to securely aggregate local models, preserving the privacy of individual models. Second, given the distributed nature and graph structure between clients and nodes in the pre-processing layer, GNN is leveraged to identify abnormal local models, enhancing system security. Third, DLT is utilized to decentralize the system by selecting one of the candidates to perform the central server's functions. Additionally, DLT ensures reliable data management by recording data exchanges in an immutable and transparent ledger. The feasibility of the novel architecture is validated through simulations, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines.
    Data Augmentation in the Underparameterized and Overparameterized Regimes. (arXiv:2202.09134v3 [cs.LG] UPDATED)
    We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables.
    Analytical Modelling of Raw Data for Flow-Guided In-body Nanoscale Localization. (arXiv:2309.16034v1 [cs.ET])
    Advancements in nanotechnology and material science are paving the way toward nanoscale devices that combine sensing, computing, data and energy storage, and wireless communication. In precision medicine, these nanodevices show promise for disease diagnostics, treatment, and monitoring from within the patients' bloodstreams. Assigning the location of a sensed biological event with the event itself, which is the main proposition of flow-guided in-body nanoscale localization, would be immensely beneficial from the perspective of precision medicine. The nanoscale nature of the nanodevices and the challenging environment that the bloodstream represents, result in current flow-guided localization approaches being constrained in their communication and energy-related capabilities. The communication and energy constraints of the nanodevices result in different features of raw data for flow-guided localization, in turn affecting its performance. An analytical modeling of the effects of imperfect communication and constrained energy causing intermittent operation of the nanodevices on the raw data produced by the nanodevices would be beneficial. Hence, we propose an analytical model of raw data for flow-guided localization, where the raw data is modeled as a function of communication and energy-related capabilities of the nanodevice. We evaluate the model by comparing its output with the one obtained through the utilization of a simulator for objective evaluation of flow-guided localization, featuring comparably higher level of realism. Our results across a number of scenarios and heterogeneous performance metrics indicate high similarity between the model and simulator-generated raw datasets.
    RLLTE: Long-Term Evolution Project of Reinforcement Learning. (arXiv:2309.16382v1 [cs.AI])
    We present RLLTE: a long-term evolution, extremely modular, and open-source framework for reinforcement learning (RL) research and application. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms. More specifically, RLLTE decouples the RL algorithms completely from the exploitation-exploration perspective, providing a large number of components to accelerate algorithm development and evolution. In particular, RLLTE is the first RL framework to build a complete and luxuriant ecosystem, which includes model training, evaluation, deployment, benchmark hub, and large language model (LLM)-empowered copilot. RLLTE is expected to set standards for RL engineering practice and be highly stimulative for industry and academia.
    Flexible and efficient spatial extremes emulation via variational autoencoders. (arXiv:2307.08079v2 [stat.ML] UPDATED)
    Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
    DPA-WNO: A gray box model for a class of stochastic mechanics problem. (arXiv:2309.15128v2 [cs.LG] UPDATED)
    The well-known governing physics in science and engineering is often based on certain assumptions and approximations. Therefore, analyses and designs carried out based on these equations are also approximate. The emergence of data-driven models has, to a certain degree, addressed this challenge; however, the purely data-driven models often (a) lack interpretability, (b) are data-hungry, and (c) do not generalize beyond the training window. Operator learning has recently been proposed as a potential alternative to address the aforementioned challenges; however, the challenges are still persistent. We here argue that one of the possible solutions resides in data-physics fusion, where the data-driven model is used to correct/identify the missing physics. To that end, we propose a novel Differentiable Physics Augmented Wavelet Neural Operator (DPA-WNO). The proposed DPA-WNO blends a differentiable physics solver with the Wavelet Neural Operator (WNO), where the role of WNO is to model the missing physics. This empowers the proposed framework to exploit the capability of WNO to learn from data while retaining the interpretability and generalizability associated with physics-based solvers. We illustrate the applicability of the proposed approach in solving time-dependent uncertainty quantification problems due to randomness in the initial condition. Four benchmark uncertainty quantification and reliability analysis examples from various fields of science and engineering are solved using the proposed approach. The results presented illustrate interesting features of the proposed approach.
    Lossless Transformations and Excess Risk Bounds in Statistical Inference. (arXiv:2307.16735v2 [cs.IT] UPDATED)
    We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.
    Uncovering Neural Scaling Laws in Molecular Representation Learning. (arXiv:2309.15123v2 [physics.chem-ph] UPDATED)
    Molecular Representation Learning (MRL) has emerged as a powerful tool for drug and materials discovery in a variety of tasks such as virtual screening and inverse design. While there has been a surge of interest in advancing model-centric techniques, the influence of both data quantity and quality on molecular representations is not yet clearly understood within this field. In this paper, we delve into the neural scaling behaviors of MRL from a data-centric viewpoint, examining four key dimensions: (1) data modalities, (2) dataset splitting, (3) the role of pre-training, and (4) model capacity. Our empirical studies confirm a consistent power-law relationship between data volume and MRL performance across these dimensions. Additionally, through detailed analysis, we identify potential avenues for improving learning efficiency. To challenge these scaling laws, we adapt seven popular data pruning strategies to molecular data and benchmark their performance. Our findings underline the importance of data-centric MRL and highlight possible directions for future research.
    Telescope: An Automated Hybrid Forecasting Approach on a Level-Playing Field. (arXiv:2309.15871v1 [cs.LG])
    In many areas of decision-making, forecasting is an essential pillar. Consequently, many different forecasting methods have been proposed. From our experience, recently presented forecasting methods are computationally intensive, poorly automated, tailored to a particular data set, or they lack a predictable time-to-result. To this end, we introduce Telescope, a novel machine learning-based forecasting approach that automatically retrieves relevant information from a given time series and splits it into parts, handling each of them separately. In contrast to deep learning methods, our approach doesn't require parameterization or the need to train and fit a multitude of parameters. It operates with just one time series and provides forecasts within seconds without any additional setup. Our experiments show that Telescope outperforms recent methods by providing accurate and reliable forecasts while making no assumptions about the analyzed time series.
    STAEformer: Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting. (arXiv:2308.10425v4 [cs.LG] UPDATED)
    With the rapid development of the Intelligent Transportation System (ITS), accurate traffic forecasting has emerged as a critical challenge. The key bottleneck lies in capturing the intricate spatio-temporal traffic patterns. In recent years, numerous neural networks with complicated architectures have been proposed to address this issue. However, the advancements in network architectures have encountered diminishing performance gains. In this study, we present a novel component called spatio-temporal adaptive embedding that can yield outstanding results with vanilla transformers. Our proposed Spatio-Temporal Adaptive Embedding transformer (STAEformer) achieves state-of-the-art performance on five real-world traffic forecasting datasets. Further experiments demonstrate that spatio-temporal adaptive embedding plays a crucial role in traffic forecasting by effectively capturing intrinsic spatio-temporal relations and chronological information in traffic time series.
    Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance. (arXiv:2308.05034v3 [cs.CR] UPDATED)
    Provenance graphs are structured audit logs that describe the history of a system's execution. Recent studies have explored a variety of techniques to analyze provenance graphs for automated host intrusion detection, focusing particularly on advanced persistent threats. Sifting through their design documents, we identify four common dimensions that drive the development of provenance-based intrusion detection systems (PIDSes): scope (can PIDSes detect modern attacks that infiltrate across application boundaries?), attack agnosticity (can PIDSes detect novel attacks without a priori knowledge of attack characteristics?), timeliness (can PIDSes efficiently monitor host systems as they run?), and attack reconstruction (can PIDSes distill attack activity from large provenance graphs so that sysadmins can easily understand and quickly respond to system intrusion?). We present KAIROS, the first PIDS that simultaneously satisfies the desiderata in all four dimensions, whereas existing approaches sacrifice at least one and struggle to achieve comparable detection performance. Kairos leverages a novel graph neural network-based encoder-decoder architecture that learns the temporal evolution of a provenance graph's structural changes to quantify the degree of anomalousness for each system event. Then, based on this fine-grained information, Kairos reconstructs attack footprints, generating compact summary graphs that accurately describe malicious activity over a stream of system audit logs. Using state-of-the-art benchmark datasets, we demonstrate that Kairos outperforms previous approaches.
    Developing a Philosophical Framework for Fair Machine Learning: Lessons From The Case of Algorithmic Collusion. (arXiv:2208.06308v2 [cs.LG] UPDATED)
    Fair machine learning research has been primarily concerned with classification tasks that result in discrimination. However, as machine learning algorithms are applied in new contexts the harms and injustices that result are qualitatively different than those presently studied. The existing research paradigm in machine learning which develops metrics and definitions of fairness cannot account for these qualitatively different types of injustice. One example of this is the problem of algorithmic collusion and market fairness. The negative consequences of algorithmic collusion affect all consumers, not only particular members of a protected class. Drawing on this case study, I propose an ethical framework for researchers and practitioners in machine learning seeking to develop and apply fairness metrics that extends to new domains. This contribution ties the development of formal metrics of fairness to specifically scoped normative principles. This enables fairness metrics to reflect different concerns from discrimination. I conclude with the limitations of my proposal and discuss promising avenues for future research.
    Creating walls to avoid unwanted points in root finding and optimization. (arXiv:2309.11475v2 [math.OC] UPDATED)
    In root finding and optimization, there are many cases where there is a closed set $A$ one likes that the sequence constructed by one's favourite method will not converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $x^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm. In this paper, we propose two new methods aiming to achieve this. In the first method, we divide the cost function by an appropriate power of the distance function to $A$. This idea is inspired by how one would try to find all roots of a function in 1 variable. In the second method, which is more suitable for constrained optimization, we redefine the value of the function to be a big constant on $A$. We also propose, based on this, an algorithm to escape the basin of attraction of a component of positive dimension to reach another component. As an application, we prove a rigorous guarantee for finding roots of a meromorphic function of 1 complex variable in a given domain. Along the way, we compare with main existing relevant methods in the current literature. We provide several examples in various different settings to illustrate the usefulness of the new approach.
    Improving Robustness of Deep Convolutional Neural Networks via Multiresolution Learning. (arXiv:2309.13752v2 [cs.LG] UPDATED)
    The current learning process of deep learning, regardless of any deep neural network (DNN) architecture and/or learning algorithm used, is essentially a single resolution training. We explore multiresolution learning and show that multiresolution learning can significantly improve robustness of DNN models for both 1D signal and 2D signal (image) prediction problems. We demonstrate this improvement in terms of both noise and adversarial robustness as well as with small training dataset size. Our results also suggest that it may not be necessary to trade standard accuracy for robustness with multiresolution learning, which is, interestingly, contrary to the observation obtained from the traditional single resolution learning setting.
    HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation. (arXiv:2305.03942v3 [cs.RO] UPDATED)
    Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io.
    A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning. (arXiv:2308.13978v2 [cs.AI] UPDATED)
    Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard combinatorial optimization problems in the form of binary variables. The Hamiltonian function is often used to formulate QUBO problems where it is used as the objective function in the context of optimization. Recently, PI-GNN, a generic scalable framework, has been proposed to address the Combinatorial Optimization (CO) problems over graphs based on a simple Graph Neural Network (GNN) architecture. Their novel contribution was a generic QUBO-formulated Hamiltonian-inspired loss function that was optimized using GNN. In this study, we address a crucial issue related to the aforementioned setup especially observed in denser graphs. The reinforcement learning-based paradigm has also been widely used to address numerous CO problems. Here we also formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the Reinforcement Learning paradigm to directly integrate the actual node projection status during training as the form of rewards. In our experiments, we observed up to 44% improvement in the RL-based setup compared to the PI-GNN algorithm. Our implementation can be found in https://github.com/rizveeredwan/learning-graph-structure.
    Temporal Graph Benchmark for Machine Learning on Temporal Graphs. (arXiv:2307.01026v2 [cs.LG] UPDATED)
    We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.
    Model Sparsity Can Simplify Machine Unlearning. (arXiv:2304.04934v8 [cs.LG] UPDATED)
    In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process to remove the influence of specific examples from a given model. Although exact unlearning can be achieved through complete model retraining using the remaining dataset, the associated computational costs have driven the development of efficient, approximate unlearning techniques. Moving beyond data-centric MU approaches, our study introduces a novel model-based perspective: model sparsification via weight pruning, which is capable of reducing the gap between exact unlearning and approximate unlearning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. This leads to a new MU paradigm, termed prune first, then unlearn, which infuses a sparse model prior into the unlearning process. Building on this insight, we also develop a sparsity-aware unlearning method that utilizes sparsity regularization to enhance the training process of approximate unlearning. Extensive experiments show that our proposals consistently benefit MU in various unlearning scenarios. A notable highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest unlearning methods) when using sparsity-aware unlearning. Furthermore, we demonstrate the practical impact of our proposed MU methods in addressing other machine learning challenges, such as defending against backdoor attacks and enhancing transfer learning. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.
    Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions. (arXiv:2305.18471v2 [cs.LG] UPDATED)
    We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions. The proof is essentially based on a novel auxiliary function $\xi$ that helps eliminate the complexity of handling the correlation between the numerator and denominator of AdaGrad's update. Leveraging simple proofs, we are able to obtain tighter results than existing results \citep{faw2022power} and extend the analysis to several new and important cases. Specifically, for the over-parameterized regime, we show that AdaGrad needs only $\mathcal{O}(\frac{1}{\varepsilon^2})$ iterations to ensure the gradient norm smaller than $\varepsilon$, which matches the rate of SGD and significantly tighter than existing rates $\mathcal{O}(\frac{1}{\varepsilon^4})$ for AdaGrad. We then discard the bounded smoothness assumption and consider a realistic assumption on smoothness called $(L_0,L_1)$-smooth condition, which allows local smoothness to grow with the gradient norm. Again based on the auxiliary function $\xi$, we prove that AdaGrad succeeds in converging under $(L_0,L_1)$-smooth condition as long as the learning rate is lower than a threshold. Interestingly, we further show that the requirement on learning rate under the $(L_0,L_1)$-smooth condition is necessary via proof by contradiction, in contrast with the case of uniform smoothness conditions where convergence is guaranteed regardless of learning rate choices. Together, our analyses broaden the understanding of AdaGrad and demonstrate the power of the new auxiliary function in the investigations of AdaGrad.
    Transformer-VQ: Linear-Time Transformers via Vector Quantization. (arXiv:2309.16354v1 [cs.LG])
    We introduce Transformer-VQ, a decoder-only transformer computing softmax-based dense self-attention in linear time. Transformer-VQ's efficient attention is enabled by vector-quantized keys and a novel caching mechanism. In large-scale experiments, Transformer-VQ is shown highly competitive in quality, with strong results on Enwik8 (0.99 bpb), PG-19 (26.6 ppl), and ImageNet64 (3.16 bpb). Code: https://github.com/transformer-vq/transformer_vq
    Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v4 [cs.LG] UPDATED)
    We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise a linear-time algorithm for smooth losses that has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
    Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning. (arXiv:2305.16912v2 [cs.LG] UPDATED)
    In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set, which consists of one ground-truth label and several false positive labels. Multi-instance partial-label learning (MIPL) is a learning paradigm to deal with such tasks and has achieved favorable performances. Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels. However, this scheme may be suboptimal as global bag-level information is ignored and the predicted labels of bags are sensitive to predictions of negative instances. In this paper, we study an alternative scheme where a multi-instance bag is embedded into a single vector representation. Accordingly, an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning, is proposed. DEMIPL employs a disambiguation attention mechanism to aggregate a multi-instance bag into a single vector representation, followed by a momentum-based disambiguation strategy to identify the ground-truth label from the candidate label set. Furthermore, we introduce a real-world MIPL dataset for colorectal cancer classification. Experimental results on benchmark and real-world datasets validate the superiority of DEMIPL against the compared MIPL and partial-label learning approaches.
    Online Distribution Shift Detection via Recency Prediction. (arXiv:2211.09916v3 [cs.RO] UPDATED)
    When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate - i.e., when there is no distribution shift, our system is very unlikely (with probability $< \epsilon$) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs.
    Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification. (arXiv:2301.11562v5 [cs.LG] UPDATED)
    Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning.
    On Learning with LAD. (arXiv:2309.16630v1 [cs.LG])
    The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.
    Open Source Infrastructure for Differentiable Density Functional Theory. (arXiv:2309.15985v1 [cs.LG])
    Learning exchange correlation functionals, used in quantum chemistry calculations, from data has become increasingly important in recent years, but training such a functional requires sophisticated software infrastructure. For this reason, we build open source infrastructure to train neural exchange correlation functionals. We aim to standardize the processing pipeline by adapting state-of-the-art techniques from work done by multiple groups. We have open sourced the model in the DeepChem library to provide a platform for additional research on differentiable quantum chemistry methods.
    Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels. (arXiv:2303.16296v3 [cs.CV] UPDATED)
    The soft Dice loss (SDL) has taken a pivotal role in numerous automated segmentation pipelines in the medical imaging community. Over the last years, some reasons behind its superior functioning have been uncovered and further optimizations have been explored. However, there is currently no implementation that supports its direct utilization in scenarios involving soft labels. Hence, a synergy between the use of SDL and research leveraging the use of soft labels, also in the context of model calibration, is still missing. In this work, we introduce Dice semimetric losses (DMLs), which (i) are by design identical to SDL in a standard setting with hard labels, but (ii) can be employed in settings with soft labels. Our experiments on the public QUBIQ, LiTS and KiTS benchmarks confirm the potential synergy of DMLs with soft labels (e.g.\ averaging, label smoothing, and knowledge distillation) over hard labels (e.g.\ majority voting and random selection). As a result, we obtain superior Dice scores and model calibration, which supports the wider adoption of DMLs in practice. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.
    Unsupervised Discovery of Extreme Weather Events Using Universal Representations of Emergent Organization. (arXiv:2304.12586v2 [physics.comp-ph] UPDATED)
    Spontaneous self-organization is ubiquitous in systems far from thermodynamic equilibrium. While organized structures that emerge dominate transport properties, universal representations that identify and describe these key objects remain elusive. Here, we introduce a theoretically-grounded framework for describing emergent organization that, via data-driven algorithms, is constructive in practice. Its building blocks are spacetime lightcones that embody how information propagates across a system through local interactions. We show that predictive equivalence classes of lightcones -- local causal states -- capture organized behaviors and coherent structures in complex spatiotemporal systems. Employing an unsupervised physics-informed machine learning algorithm and a high-performance computing implementation, we demonstrate automatically discovering coherent structures in two real world domain science problems. We show that local causal states identify vortices and track their power-law decay behavior in two-dimensional fluid turbulence. We then show how to detect and track familiar extreme weather events -- hurricanes and atmospheric rivers -- and discover other novel coherent structures associated with precipitation extremes in high-resolution climate data at the grid-cell level.
    Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models. (arXiv:2309.16521v1 [stat.ML])
    We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction
    Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space. (arXiv:2302.06807v3 [stat.ML] UPDATED)
    Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA.
    Efficient Adversarial Input Generation via Neural Net Patching. (arXiv:2211.16808v2 [cs.LG] UPDATED)
    The generation of adversarial inputs has become a crucial issue in establishing the robustness and trustworthiness of deep neural nets, especially when they are used in safety-critical application domains such as autonomous vehicles and precision medicine. However, the problem poses multiple practical challenges, including scalability issues owing to large-sized networks, and the generation of adversarial inputs that lack important qualities such as naturalness and output-impartiality. This problem shares its end goal with the task of patching neural nets where small changes in some of the network's weights need to be discovered so that upon applying these changes, the modified net produces the desirable output for a given set of inputs. We exploit this connection by proposing to obtain an adversarial input from a patch, with the underlying observation that the effect of changing the weights can also be brought about by changing the inputs instead. Thus, this paper presents a novel way to generate input perturbations that are adversarial for a given network by using an efficient network patching technique. We note that the proposed method is significantly more effective than the prior state-of-the-art techniques.
    M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning. (arXiv:2309.16578v1 [stat.ML])
    Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation. (arXiv:2309.16118v1 [cs.RO])
    Scene representation has been a crucial design choice in robotic manipulation systems. An ideal representation should be 3D, dynamic, and semantic to meet the demands of diverse manipulation tasks. However, previous works often lack all three properties simultaneously. In this work, we introduce D$^3$Fields - dynamic 3D descriptor fields. These fields capture the dynamics of the underlying 3D environment and encode both semantic features and instance masks. Specifically, we project arbitrary 3D points in the workspace onto multi-view 2D visual observations and interpolate features derived from foundational models. The resulting fused descriptor fields allow for flexible goal specifications using 2D images with varied contexts, styles, and instances. To evaluate the effectiveness of these descriptor fields, we apply our representation to a wide range of robotic manipulation tasks in a zero-shot manner. Through extensive evaluation in both real-world scenarios and simulations, we demonstrate that D$^3$Fields are both generalizable and effective for zero-shot robotic manipulation tasks. In quantitative comparisons with state-of-the-art dense descriptors, such as Dense Object Nets and DINO, D$^3$Fields exhibit significantly better generalization abilities and manipulation accuracy.
    Resisting Backdoor Attacks in Federated Learning via Bidirectional Elections and Individual Perspective. (arXiv:2309.16456v1 [cs.LG])
    Existing approaches defend against backdoor attacks in federated learning (FL) mainly through a) mitigating the impact of infected models, or b) excluding infected models. The former negatively impacts model accuracy, while the latter usually relies on globally clear boundaries between benign and infected model updates. However, model updates are easy to be mixed and scattered throughout in reality due to the diverse distributions of local data. This work focuses on excluding infected models in FL. Unlike previous perspectives from a global view, we propose Snowball, a novel anti-backdoor FL framework through bidirectional elections from an individual perspective inspired by one principle deduced by us and two principles in FL and deep learning. It is characterized by a) bottom-up election, where each candidate model update votes to several peer ones such that a few model updates are elected as selectees for aggregation; and b) top-down election, where selectees progressively enlarge themselves through picking up from the candidates. We compare Snowball with state-of-the-art defenses to backdoor attacks in FL on five real-world datasets, demonstrating its superior resistance to backdoor attacks and slight impact on the accuracy of the global model.
    DeepPCR: Parallelizing Sequential Operations in Neural Networks. (arXiv:2309.16318v1 [cs.LG])
    Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a computational cost proportional to the number of steps involved, presenting a potential bottleneck as the number of steps increases. In this work, we introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations used in inference and training of neural networks. DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. This reduces the complexity of computing the sequential operations from $\mathcal{O}(L)$ to $\mathcal{O}(\log_2L)$, thus yielding a speedup for large $L$. To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to $30\times$ for forward and $200\times$ for backward pass. We additionally showcase the flexibility of DeepPCR by parallelizing training of ResNets with as many as 1024 layers, and generation in diffusion models, enabling up to $7\times$ faster training and $11\times$ faster generation, respectively, when compared to the sequential approach.
    CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption. (arXiv:2309.16563v1 [stat.ML])
    We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
    E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network. (arXiv:2309.16117v1 [cs.LG])
    Continual Learning methods are designed to learn new tasks without erasing previous knowledge. However, Continual Learning often requires massive computational power and storage capacity for satisfactory performance. In this paper, we propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net). Leveraging core subnet distillation and precise replay sample selection, E2Net achieves superior average accuracy and diminished forgetting within the same computational and storage constraints, all while minimizing processing time. In E2Net, we propose Representative Network Distillation to identify the representative core subnet by assessing parameter quantity and output similarity with the working network, distilling analogous subnets within the working network to mitigate reliance on rehearsal buffers and facilitating knowledge transfer across previous tasks. To enhance storage resource utilization, we then propose Subnet Constraint Experience Replay to optimize rehearsal efficiency through a sample storage strategy based on the structures of representative networks. Extensive experiments conducted predominantly on cloud environments with diverse datasets and also spanning the edge environment demonstrate that E2Net consistently outperforms state-of-the-art methods. In addition, our method outperforms competitors in terms of both storage and computational requirements.
    Geodesic Regression Characterizes 3D Shape Changes in the Female Brain During Menstruation. (arXiv:2309.16662v1 [cs.CV])
    Women are at higher risk of Alzheimer's and other neurological diseases after menopause, and yet research connecting female brain health to sex hormone fluctuations is limited. We seek to investigate this connection by developing tools that quantify 3D shape changes that occur in the brain during sex hormone fluctuations. Geodesic regression on the space of 3D discrete surfaces offers a principled way to characterize the evolution of a brain's shape. However, in its current form, this approach is too computationally expensive for practical use. In this paper, we propose approximation schemes that accelerate geodesic regression on shape spaces of 3D discrete surfaces. We also provide rules of thumb for when each approximation can be used. We test our approach on synthetic data to quantify the speed-accuracy trade-off of these approximations and show that practitioners can expect very significant speed-up while only sacrificing little accuracy. Finally, we apply the method to real brain shape data and produce the first characterization of how the female hippocampus changes shape during the menstrual cycle as a function of progesterone: a characterization made (practically) possible by our approximation schemes. Our work paves the way for comprehensive, practical shape analyses in the fields of bio-medicine and computer vision. Our implementation is publicly available on GitHub: https://github.com/bioshape-lab/my28brains.
    Hierarchical Network Data Analytics Framework for B5G Network Automation: Design and Implementation. (arXiv:2309.16269v1 [cs.NI])
    5G introduced modularized network functions (NFs) to support emerging services in a more flexible and elastic manner. To mitigate the complexity in such modularized NF management, automated network operation and management are indispensable, and thus the 3rd generation partnership project (3GPP) has introduced a network data analytics function (NWDAF). However, a conventional NWDAF needs to conduct both inference and training tasks, and thus it is difficult to provide the analytics results to NFs in a timely manner for an increased number of analytics requests. In this article, we propose a hierarchical network data analytics framework (H-NDAF) where inference tasks are distributed to multiple leaf NWDAFs and training tasks are conducted at the root NWDAF. Extensive simulation results using open-source software (i.e., free5GC) demonstrate that H-NDAF can provide sufficiently accurate analytics and faster analytics provision time compared to the conventional NWDAF.
    Projection based fuzzy least squares twin support vector machine for class imbalance problems. (arXiv:2309.15886v1 [cs.LG])
    Class imbalance is a major problem in many real world classification tasks. Due to the imbalance in the number of samples, the support vector machine (SVM) classifier gets biased toward the majority class. Furthermore, these samples are often observed with a certain degree of noise. Therefore, to remove these problems we propose a novel fuzzy based approach to deal with class imbalanced as well noisy datasets. We propose two approaches to address these problems. The first approach is based on the intuitionistic fuzzy membership, termed as robust energy-based intuitionistic fuzzy least squares twin support vector machine (IF-RELSTSVM). Furthermore, we introduce the concept of hyperplane-based fuzzy membership in our second approach, where the final classifier is termed as robust energy-based fuzzy least square twin support vector machine (F-RELSTSVM). By using this technique, the membership values are based on a projection based approach, where the data points are projected on the hyperplanes. The performance of the proposed algorithms is evaluated on several benchmark and synthetic datasets. The experimental results show that the proposed IF-RELSTSVM and F-RELSTSVM models outperform the baseline algorithms. Statistical tests are performed to check the significance of the proposed algorithms. The results show the applicability of the proposed algorithms on noisy as well as imbalanced datasets.
    Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words. (arXiv:2309.16108v1 [cs.CV])
    Vision Transformer (ViT) has emerged as a powerful architecture in the realm of modern computer vision. However, its application in certain imaging fields, such as microscopy and satellite imaging, presents unique challenges. In these domains, images often contain multiple channels, each carrying semantically distinct and independent information. Furthermore, the model must demonstrate robustness to sparsity in input channels, as they may not be densely available during training or testing. In this paper, we propose a modification to the ViT architecture that enhances reasoning across the input channels and introduce Hierarchical Channel Sampling (HCS) as an additional regularization technique to ensure robustness when only partial channels are presented during test time. Our proposed model, ChannelViT, constructs patch tokens independently from each input channel and utilizes a learnable channel embedding that is added to the patch tokens, similar to positional embeddings. We evaluate the performance of ChannelViT on ImageNet, JUMP-CP (microscopy cell imaging), and So2Sat (satellite imaging). Our results show that ChannelViT outperforms ViT on classification tasks and generalizes well, even when a subset of input channels is used during testing. Across our experiments, HCS proves to be a powerful regularizer, independent of the architecture employed, suggesting itself as a straightforward technique for robust ViT training. Lastly, we find that ChannelViT generalizes effectively even when there is limited access to all channels during training, highlighting its potential for multi-channel imaging under real-world conditions with sparse sensors.
    Unified Long-Term Time-Series Forecasting Benchmark. (arXiv:2309.15946v1 [cs.LG])
    In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.
    Max-Sliced Mutual Information. (arXiv:2309.16200v1 [cs.LG])
    Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
    Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies. (arXiv:2309.16025v1 [cs.LG])
    Current methods of imitation learning (IL), primarily based on deep neural networks, offer efficient means for obtaining driving policies from real-world data but suffer from significant limitations in interpretability and generalizability. These shortcomings are particularly concerning in safety-critical applications like autonomous driving. In this paper, we address these limitations by introducing Symbolic Imitation Learning (SIL), a groundbreaking method that employs Inductive Logic Programming (ILP) to learn driving policies which are transparent, explainable and generalisable from available datasets. Utilizing the real-world highD dataset, we subject our method to a rigorous comparative analysis against prevailing neural-network-based IL methods. Our results demonstrate that SIL not only enhances the interpretability of driving policies but also significantly improves their applicability across varied driving situations. Hence, this work offers a novel pathway to more reliable and safer autonomous driving systems, underscoring the potential of integrating ILP into the domain of IL.
    Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images. (arXiv:2309.16066v1 [cs.LG])
    This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations.  ( 2 min )
    Imbalanced Data Stream Classification using Dynamic Ensemble Selection. (arXiv:2309.09175v2 [cs.LG] UPDATED)
    Modern streaming data categorization faces significant challenges from concept drift and class imbalanced data. This negatively impacts the output of the classifier, leading to improper classification. Furthermore, other factors such as the overlapping of multiple classes limit the extent of the correctness of the output. This work proposes a novel framework for integrating data pre-processing and dynamic ensemble selection, by formulating the classification framework for the nonstationary drifting imbalanced data stream, which employs the data pre-processing and dynamic ensemble selection techniques. The proposed framework was evaluated using six artificially generated data streams with differing imbalance ratios in combination with two different types of concept drifts. Each stream is composed of 200 chunks of 500 objects described by eight features and contains five concept drifts. Seven pre-processing techniques and two dynamic ensemble selection methods were considered. According to experimental results, data pre-processing combined with Dynamic Ensemble Selection techniques significantly delivers more accuracy when dealing with imbalanced data streams.
    Just Noticeable Difference Modeling for Face Recognition System. (arXiv:2209.05856v2 [cs.CV] UPDATED)
    High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.
    High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models. (arXiv:2309.15889v1 [eess.IV])
    We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. We introduce a novel scheme that utilizes the range-null space decomposition of the target image. We transmit the range-space of the image after encoding and employ DDPM to progressively refine its null space contents. Through extensive experiments, we demonstrate significant improvements in distortion and perceptual quality of reconstructed images compared to standard DeepJSCC and the state-of-the-art generative learning-based method. We will publicly share our source code to facilitate further research and reproducibility.
    Classical-to-quantum convolutional neural network transfer learning. (arXiv:2208.14708v2 [quant-ph] UPDATED)
    Machine learning using quantum convolutional neural networks (QCNNs) has demonstrated success in both quantum and classical data classification. In previous studies, QCNNs attained a higher classification accuracy than their classical counterparts under the same training conditions in the few-parameter regime. However, the general performance of large-scale quantum models is difficult to examine because of the limited size of quantum circuits, which can be reliably implemented in the near future. We propose transfer learning as an effective strategy for utilizing small QCNNs in the noisy intermediate-scale quantum era to the full extent. In the classical-to-quantum transfer learning framework, a QCNN can solve complex classification problems without requiring a large-scale quantum circuit by utilizing a pre-trained classical convolutional neural network (CNN). We perform numerical simulations of QCNN models with various sets of quantum convolution and pooling operations for MNIST data classification under transfer learning, in which a classical CNN is trained with Fashion-MNIST data. The results show that transfer learning from classical to quantum CNN performs considerably better than purely classical transfer learning models under similar training conditions.
    Discouraging posterior collapse in hierarchical Variational Autoencoders using context. (arXiv:2302.09976v2 [cs.LG] UPDATED)
    Hierarchical Variational Autoencoders (VAEs) are among the most popular likelihood-based generative models. There is a consensus that the top-down hierarchical VAEs allow effective learning of deep latent structures and avoid problems like posterior collapse. Here, we show that this is not necessarily the case, and the problem of collapsing posteriors remains. To discourage this issue, we propose a deep hierarchical VAE with a context on top. Specifically, we use a Discrete Cosine Transform to obtain the last latent variable. In a series of experiments, we observe that the proposed modification allows us to achieve better utilization of the latent space and does not harm the model's generative abilities.
    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. (arXiv:2308.03666v3 [stat.ML] UPDATED)
    As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
    Latent Graph Powered Semi-Supervised Learning on Biomedical Tabular Data. (arXiv:2309.15757v2 [cs.LG] UPDATED)
    In the domain of semi-supervised learning, the current approaches insufficiently exploit the potential of considering inter-instance relationships among (un)labeled data. In this work, we address this limitation by providing an approach for inferring latent graphs that capture the intrinsic data relationships. By leveraging graph-based representations, our approach facilitates the seamless propagation of information throughout the graph, enabling the effective incorporation of global and local knowledge. Through evaluations on biomedical tabular datasets, we compare the capabilities of our approach to other contemporary methods. Our work demonstrates the significance of inter-instance relationship discovery as practical means for constructing robust latent graphs to enhance semi-supervised learning techniques. Our method achieves state-of-the-art results on three biomedical datasets.
    Enhancing Sharpness-Aware Optimization Through Variance Suppression. (arXiv:2309.15639v2 [cs.LG] UPDATED)
    Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.
    Deep learning for bias-correcting CMIP6-class Earth system models. (arXiv:2301.01253v3 [physics.ao-ph] UPDATED)
    The accurate representation of precipitation in Earth system models (ESMs) is crucial for reliable projections of the ecological and socioeconomic impacts in response to anthropogenic global warming. The complex cross-scale interactions of processes that produce precipitation are challenging to model, however, inducing potentially strong biases in ESM fields, especially regarding extremes. State-of-the-art bias correction methods only address errors in the simulated frequency distributions locally at every individual grid cell. Improving unrealistic spatial patterns of the ESM output, which would require spatial context, has not been possible so far. Here, we show that a post-processing method based on physically constrained generative adversarial networks (cGANs) can correct biases of a state-of-the-art, CMIP6-class ESM both in local frequency distributions and in the spatial patterns at once. While our method improves local frequency distributions equally well as gold-standard bias-adjustment frameworks, it strongly outperforms any existing methods in the correction of spatial patterns, especially in terms of the characteristic spatial intermittency of precipitation extremes.
    Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs. (arXiv:2309.05516v2 [cs.CL] UPDATED)
    Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound competes impressively against recent methods without introducing additional inference overhead. The source code will be publicly available at \url{https://github.com/intel/neural-compressor} soon.
    DIRA: Dynamic Domain Incremental Regularised Adaptation. (arXiv:2205.00147v4 [cs.LG] UPDATED)
    Autonomous systems (AS) often use Deep Neural Network (DNN) classifiers to allow them to operate in complex, high-dimensional, non-linear, and dynamically changing environments. Due to the complexity of these environments, DNN classifiers may output misclassifications during operation when they face domains not identified during development. Removing a system from operation for retraining becomes impractical as the number of such AS increases. To increase AS reliability and overcome this limitation, DNN classifiers need to have the ability to adapt during operation when faced with different operational domains using a few samples (e.g. 100 samples). However, retraining DNNs on a few samples is known to cause catastrophic forgetting. In this paper, we introduce Dynamic Incremental Regularised Adaptation (DIRA), a framework for operational domain adaption of DNN classifiers using regularisation techniques to overcome catastrophic forgetting and achieve adaptation when retraining using a few samples of the target domain. Our approach shows improvements on different image classification benchmarks aimed at evaluating robustness to distribution shifts (e.g.CIFAR-10C/100C, ImageNet-C), and produces state-of-the-art performance in comparison with other frameworks from the literature.
    STAG: Enabling Low Latency and Low Staleness of GNN-based Services with Dynamic Graphs. (arXiv:2309.15875v1 [cs.LG])
    Many emerging user-facing services adopt Graph Neural Networks (GNNs) to improve serving accuracy. When the graph used by a GNN model changes, representations (embedding) of nodes in the graph should be updated accordingly. However, the node representation update is too slow, resulting in either long response latency of user queries (the inference is performed after the update completes) or high staleness problem (the inference is performed based on stale data). Our in-depth analysis shows that the slow update is mainly due to neighbor explosion problem in graphs and duplicated computation. Based on such findings, we propose STAG, a GNN serving framework that enables low latency and low staleness of GNN-based services. It comprises a collaborative serving mechanism and an additivity-based incremental propagation strategy. With the collaborative serving mechanism, only part of node representations are updated during the update phase, and the final representations are calculated in the inference phase. It alleviates the neighbor explosion problem. The additivity-based incremental propagation strategy reuses intermediate data during the update phase, eliminating duplicated computation problem. Experimental results show that STAG accelerates the update phase by 1.3x~90.1x, and greatly reduces staleness time with a slight increase in response latency.
    IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function. (arXiv:2304.06366v2 [cs.AI] UPDATED)
    Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.
  • Open

    Learning Interpretable Characteristic Kernels via Decision Forests. (arXiv:1812.00029v3 [stat.ML] UPDATED)
    Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While there has been extensive research on the applications and properties of kernels, there is relatively little research on kernels induced by decision forests. We construct Kernel Mean Embedding Random Forests (KMERF), which induce kernels from random trees and/or forests using leaf-node proximity. We introduce the notion of an asymptotically characteristic kernel, and prove that KMERF kernels are asymptotically characteristic for both discrete and continuous data. Because KMERF is data-adaptive, we suspected it would outperform kernels selected a priori on finite sample data. We illustrate that KMERF nearly dominates current state-of-the-art kernel-based tests across a diverse range of high-dimensional two-sample and independence testing settings. Furthermore, our forest-based approach is interpretable, and provides feature importance metrics that readily distinguish important dimensions, unlike other high-dimensional non-parametric testing procedures. Hence, this work demonstrates the decision forest-based kernel can be more powerful and more interpretable than existing methods, flying in the face of conventional wisdom of the trade-off between the two.
    Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy. (arXiv:1911.09307v2 [cs.LG] UPDATED)
    Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of the neighboring relationship between samples. In this work, we propose a general regularizer called \textbf{Patch-level Neighborhood Interpolation~(Pani)} that conducts a non-local representation in the computation of networks. Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived \textbf{Pani VAT} presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. The second derived \textbf{Pani MixUp} method extends the MixUp, and achieves superiority over MixUp and competitive performance over state-of-the-art variants of MixUp method with a significant advantage in computational efficiency. Extensive experiments have verified the effectiveness of our Pani approach in both supervised and semi-supervised settings.
    Nonparametric plug-in classifier for multiclass classification of S.D.E. paths. (arXiv:2212.10259v2 [math.ST] UPDATED)
    We study the multiclass classification problem where the features come from the mixture of time-homogeneous diffusions. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparametric estimators of the drift and diffusion functions. We first establish the consistency of our classification procedure under mild assumptions and then provide rates of cnvergence under different set of assumptions. Finally, a numerical study supports our theoretical findings.
    Asset Bundling for Wind Power Forecasting. (arXiv:2309.16492v1 [stat.ME])
    The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (BPR) framework that integrates asset bundling, machine learning, and forecast reconciliation techniques. The BPR framework first learns an intermediate hierarchy level (the bundles), then predicts wind power at the asset, bundle, and fleet level, and finally reconciles all forecasts to ensure consistency. This approach effectively introduces an auxiliary learning task (predicting the bundle-level time series) to help the main learning tasks. The paper also introduces new asset-bundling criteria that capture the spatio-temporal dynamics of wind power time series. Extensive numerical experiments are conducted on an industry-size dataset of 283 wind farms in the MISO footprint. The experiments consider short-term and day-ahead forecasts, and evaluates a large variety of forecasting models that include weather predictions as covariates. The results demonstrate the benefits of BPR, which consistently and significantly improves forecast accuracy over baselines, especially at the fleet level.
    High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality. (arXiv:2309.16476v1 [math.ST])
    We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
    Flexible and efficient spatial extremes emulation via variational autoencoders. (arXiv:2307.08079v2 [stat.ML] UPDATED)
    Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
    Unsupervised Fact Verification by Language Model Distillation. (arXiv:2309.16540v1 [cs.CL])
    Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.  ( 2 min )
    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit. (arXiv:2309.16620v1 [stat.ML])
    The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    Transport map unadjusted Langevin algorithms: learning and discretizing perturbed samplers. (arXiv:2302.07227v3 [stat.ME] UPDATED)
    Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. We also show that applying a transport map to an irreversibly-perturbed ULA results in a geometry-informed irreversible perturbation (GiIrr) of the original dynamics. These connections suggest more systematic ways of learning metrics and perturbations, and also yield alternative discretizations of the RMLD described by the map, which we study. Under appropriate conditions, these discretized processes can be endowed with non-asymptotic bounds describing convergence to the target distribution in 2-Wasserstein distance. Illustrative numerical results complement our theoretical claims.
    A parsimonious, computationally efficient machine learning method for spatial regression. (arXiv:2309.16448v1 [stat.ML])
    We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.  ( 2 min )
    A framework for paired-sample hypothesis testing for high-dimensional data. (arXiv:2309.16274v1 [stat.ML])
    The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.  ( 2 min )
    From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity. (arXiv:2309.16512v1 [cs.LG])
    In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.  ( 2 min )
    Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials. (arXiv:2308.15728v2 [math.ST] UPDATED)
    Graphon estimation has been one of the most fundamental problems in network analysis and has received considerable attention in the past decade. From the statistical perspective, the minimax error rate of graphon estimation has been established by Gao et al (2015) for both stochastic block model (SBM) and nonparametric graphon estimation. The statistical optimal estimators are based on constrained least squares and have computational complexity exponential in the dimension. From the computational perspective, the best-known polynomial-time estimator is based on universal singular value thresholding (USVT), but it can only achieve a much slower estimation error rate than the minimax one. It is natural to wonder if such a gap is essential. The computational optimality of the USVT or the existence of a computational barrier in graphon estimation has been a long-standing open problem. In this work, we take the first step towards it and provide rigorous evidence for the computational barrier in graphon estimation via low-degree polynomials. Specifically, in both SBM and nonparametric graphon estimation, we show that for low-degree polynomial estimators, their estimation error rates cannot be significantly better than that of the USVT under a wide range of parameter regimes. Our results are proved based on the recent development of low-degree polynomials by Schramm and Wein (2022), while we overcome a few key challenges in applying it to the general graphon estimation problem. By leveraging our main results, we also provide a computational lower bound on the clustering error for community detection in SBM with a growing number of communities and this yields a new piece of evidence for the conjectured Kesten-Stigum threshold for efficient community recovery.
    A Primer on Bayesian Neural Networks: Review and Debates. (arXiv:2309.16314v1 [stat.ML])
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
    Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples. (arXiv:2309.16143v1 [cs.LG])
    Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.  ( 3 min )
    Stackelberg Batch Policy Learning. (arXiv:2309.16188v1 [stat.ML])
    Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.
    Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics. (arXiv:2309.16109v1 [cs.LG])
    Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
    Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models. (arXiv:2309.16521v1 [stat.ML])
    We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction  ( 2 min )
    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. (arXiv:2308.03666v3 [stat.ML] UPDATED)
    As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
    Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space. (arXiv:2302.06807v3 [stat.ML] UPDATED)
    Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA.
    Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance. (arXiv:2309.16604v1 [stat.ML])
    Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.
    Lossless Transformations and Excess Risk Bounds in Statistical Inference. (arXiv:2307.16735v2 [cs.IT] UPDATED)
    We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.
    Dynamic Selection in Algorithmic Decision-making. (arXiv:2108.12547v3 [econ.EM] UPDATED)
    This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to correct for the bias. It obtains true parameter values and attains low (logarithmic-like) regret levels. We also prove a central limit theorem for statistical inference. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
    Data Augmentation in the Underparameterized and Overparameterized Regimes. (arXiv:2202.09134v3 [cs.LG] UPDATED)
    We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables.
    Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints. (arXiv:2309.16240v1 [cs.LG])
    The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning. (arXiv:2309.16578v1 [stat.ML])
    Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions. (arXiv:2201.02958v5 [stat.ME] UPDATED)
    Nested simulation concerns estimating functionals of a conditional expectation via simulation. In this paper, we propose a new method based on kernel ridge regression to exploit the smoothness of the conditional expectation as a function of the multidimensional conditioning variable. Asymptotic analysis shows that the proposed method can effectively alleviate the curse of dimensionality on the convergence rate as the simulation budget increases, provided that the conditional expectation is sufficiently smooth. The smoothness bridges the gap between the cubic root convergence rate (that is, the optimal rate for the standard nested simulation) and the square root convergence rate (that is, the canonical rate for the standard Monte Carlo simulation). We demonstrate the performance of the proposed method via numerical examples from portfolio risk management and input uncertainty quantification.
    Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification. (arXiv:2301.11562v5 [cs.LG] UPDATED)
    Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning.
    Selective Nonparametric Regression via Testing. (arXiv:2309.16412v1 [stat.ML])
    Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
    Cross-Prediction-Powered Inference. (arXiv:2309.16598v1 [stat.ML])
    While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.
    Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v4 [cs.LG] UPDATED)
    We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise a linear-time algorithm for smooth losses that has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
    CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption. (arXiv:2309.16563v1 [stat.ML])
    We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
    Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption. (arXiv:2309.16409v1 [stat.ML])
    The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.  ( 2 min )
    HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes. (arXiv:2212.10538v2 [cs.LG] UPDATED)
    Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.  ( 2 min )
    Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces. (arXiv:2309.16597v1 [cs.LG])
    Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.  ( 2 min )
    Nonparametric estimation of a covariate-adjusted counterfactual treatment regimen response curve. (arXiv:2309.16099v1 [math.ST])
    Flexible estimation of the mean outcome under a treatment regimen (i.e., value function) is the key step toward personalized medicine. We define our target parameter as a conditional value function given a set of baseline covariates which we refer to as a stratum based value function. We focus on semiparametric class of decision rules and propose a sieve based nonparametric covariate adjusted regimen-response curve estimator within that class. Our work contributes in several ways. First, we propose an inverse probability weighted nonparametrically efficient estimator of the smoothed regimen-response curve function. We show that asymptotic linearity is achieved when the nuisance functions are undersmoothed sufficiently. Asymptotic and finite sample criteria for undersmoothing are proposed. Second, using Gaussian process theory, we propose simultaneous confidence intervals for the smoothed regimen-response curve function. Third, we provide consistency and convergence rate for the optimizer of the regimen-response curve estimator; this enables us to estimate an optimal semiparametric rule. The latter is important as the optimizer corresponds with the optimal dynamic treatment regimen. Some finite-sample properties are explored with simulations.  ( 2 min )
    Improving Adaptive Online Learning Using Refined Discretization. (arXiv:2309.16044v1 [cs.LG])
    We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.  ( 2 min )

  • Open

    [R] Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
    Paper - https://arxiv.org/abs/2307.07162 submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [Research] - Resource to query ML and LLM based research
    Made a repo for you all to try using a collaborative AI tool which includes 100+ papers on LLM-Based-Agents. You can try out the repo here: https://www.collama.ai/varun/llm-based-agents submitted by /u/_llama2 [link] [comments]  ( 9 min )
    [D] Choosing the best learning model for a start up app?
    Straight off the bat: I am not very familiar but was tasked to find a suggest a reasonable model for our need. Here is a bit what I read: https://www.obviously.ai/post/how-to-choose-the-right-ai-model-for-your-application https://www.addevice.io/blog/ai-framework-for-app-development The app that I am working on is an education app, and the purpose of the AI would be to (at least in terms of priority) generate a post subject line / topic to discuss. The company is super small, so money is important. JS is being used mainly at the moment. What would be a good choice for a small start up to generate topics for an education app used by schools? At least any ideas or things to consider would be wonderful to get my rabbit hole dive started! Thanks. submitted by /u/Willy988 [link] [comments]  ( 9 min )
    [R] Gsgen: Text-to-3D using Gaussian Splatting
    Project Page Paper Code In this paper, we present Gaussian Splatting based text-to-3D generation (GSGEN), a novel approach for generating high-quality 3D objects. Previous methods suffer from inaccurate geometry and limited fidelity due to the absence of 3D prior and proper representation. We leverage 3D Gaussian Splatting, a recent state-of-the-art representation, to address existing shortcomings by exploiting the explicit nature that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative refinement to enrich details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D content with delicate details and more accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [D] Does anyone else feel like MOJO isn't getting the attention it deserves?
    https://docs.modular.com/mojo/ submitted by /u/hai_cben [link] [comments]  ( 9 min )
    [P] Carton – Run any ML model from any programming language
    Hi! I just open-sourced a project that I've been working on for a while and wanted to see what you think! The goal of Carton (https://carton.run) is to let you use a single interface to run any machine learning model from any programming language. It’s currently difficult to integrate models that use different technologies (e.g. TensorRT, Ludwig, TorchScript, JAX, GGML, etc) into your application, especially if you’re not using Python. Even if you learn the details of integrating each of these frameworks, running multiple frameworks in one process can cause hard-to-debug crashes. Ideally, the ML framework a model was developed in should just be an implementation detail. Carton lets you decouple your application from specific ML frameworks so you can focus on the problem you actually want to solve. At a high level, the way Carton works is by running models in their own processes and using an IPC system to communicate back and forth with low overhead. Carton is primarily implemented in Rust, with bindings to other languages. There are lots more details linked in the architecture doc below. Importantly, Carton uses your model’s original underlying framework (e.g. PyTorch) under the hood to actually execute the model. This is meaningful because it makes Carton composable with other technologies. For example, it’s easy to use custom ops, TensorRT, etc without changes. This lets you keep up with cutting-edge advances, but decouples them from your application. I’ve been working on Carton for almost a year now and I open sourced it on Wednesday! Some useful links: Website, docs, quickstart - https://carton.run Explore existing models - https://carton.pub Repo - https://github.com/VivekPanyam/carton Architecture - https://github.com/VivekPanyam/carton/blob/main/ARCHITECTURE.md Please let me know what you think! submitted by /u/vpanyam [link] [comments]  ( 10 min )
    [P] Location Computation
    Hi Everyone, I’m doing a project where I’m crowdsourcing a lot of location data for a set of location labels and then trying to cluster it for each and using the centroid of the cluster as the most accurate location for that location label. The data keeps coming in everyday. I’m not sure when to stop computation. Initially I thought I’ll check the delta between each days centroid computed and if the delta falls under a threshold then stop computing. But now I’m thinking if my daily data collected gets marked as outliers, subsequent days centroids won’t have much of a delta and it will pass my convergence condition. Any suggestions? submitted by /u/Longjumping-Song4958 [link] [comments]  ( 9 min )
    [D][R] Deploying deep models on memory constrained devices
    Suppose we want to use a deep learning model on a gpu within our app. We want this model to coexist on the gpu with other processes, effectively limit it's possible usage of resources. As cuDNN/cuBLAS routines are nondeterministic and possibly dynamically allocate variable amount of memory, how do people manage this problem? Is it a problem at all? Estimating memory usage of deep learning models on gpu is notoriously hard. There is a research paper from Microsoft tackling this problem and they mispredict the usage of memory by 15% on average. Some cpu BLAS libraries like openBLAS or MKL also dynamically allocate the memory, but there are alternatives - LAPACK as far as I know uses only the memory provided by the caller, making it viable option for applications in embedded. In safety crit…  ( 10 min )
    [D] Best Sequence Embedding Models?
    Which are currently the best Sentence Embedding pre-trained models out there? submitted by /u/Uilxitora [link] [comments]  ( 9 min )
    [D] Using Gamification to demystify the AI black-box
    Blog about AI "black box" nature and how it can be explained and become engaging to users using gamification. Explained with example from open-appsec an open-source machine learning-based Web Application & API Security product. https://www.openappsec.io/post/using-gamification-to-demystify-the-ai-black-box-in-a-waf-product https://github.com/openappsec/openappsec submitted by /u/onirisapp [link] [comments]  ( 9 min )
    [Project] Startup Job Post/Contractor role
    Hey all! I'm in the throws of doing a startup and looking for someone to help build a legal tech platform. I can discuss more in person, but it is intended to be a human/lawyer in the loop workflow tool for complex contract and deal analysis. Base product is built and deployed. I'm a former developer turned lawyer with 15 years corporate experiences, and need help/talent/co-founder to help take things to the next level. Ideally you have a mixture of NLP and regular software engineering background and just a very practical approach. If you've played with LLM's all the better. Options for cash, equity, larger roles are all on the table. Just looking for the right talent. DM me if you are interested and lets talk about experience, etc.! And it seems that tags are turned off in here, so not sure how to tag something as [Project] but I put it in the title. submitted by /u/pudgyplacater [link] [comments]  ( 9 min )
    [R] RealFill: Reference-Driven Generation for Authentic Image Completion
    Project page: https://realfill.github.io/ Paper: https://arxiv.org/abs/2309.16668 RealFill is able to complete the image with what should have been there. Abstract Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. ​ submitted by /u/StrawberryNumberNine [link] [comments]  ( 9 min )
    [R] Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes
    https://www.youtube.com/watch?v=aNJWCwG-H_U submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [R] M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec
    Paper : https://arxiv.org/abs/2309.07416 Demo : https://anton-jeran.github.io/MAD/ Code : https://github.com/anton-jeran/MULTI-AUDIODEC submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [R] The Future of Romance: Novel Techniques for Replacing your Boyfriend with Generative AI (parody)
    submitted by /u/TobyWasBestSpiderMan [link] [comments]  ( 9 min )
    Classical NLP course [D]
    Classical NLP course recommendation Can you recommend me NLP course that dives into classical NLP methods: For example: HMM MaxEnt CKY algo Sytactic parsing Dependency Parsing submitted by /u/Thick-brain-dude [link] [comments]  ( 9 min )
    [D] Multi-task learning leads to overfitting. Is this the double descent phenomenon?
    I have a CNN model, call it model M. It was trained on dataset A for object pose estimation. After training for 100 epochs, it resulted in these losses: Train: 0.06 Val: 0.08 As dataset A is somewhat limited, I wonder if I can incorporate additional data via a different, but related task: object segmentation for similar objects. Model M is a UNet, so I can incorporate this task simply with an additional output channel in the last layer. I add dataset B for object segmentation. During training, M learns on both datasets quite well, which suggests to me that the tasks are well-aligned. After 100 epochs, I get these losses on dataset A: Train: 0.06 Val: 0.16 This is surprising to me. If I get the same training loss on dataset A, while training on additional data. I'd expect the validation loss to be lower, since I'm training on 2x the data. Yet the validation performance is consistently higher when I train on both datasets. The only explanation I can think of is the double descent phenomenon. Perhaps when I trained only on dataset A, I was significantly over-parameterized, but past the interpolation threshold. So perhaps adding more data brought me closer to the interpolation threshold, leading to worse generalization. Does this explanation seem likely? Has anyone had similar experiences? submitted by /u/murrdpirate [link] [comments]  ( 9 min )
    [D] What's the relationship between Denoising Autoencoders and Diffusion Models?
    Hello, denoising autoencoders is when you train something to reverse x+n -> x. This seems to be basically the same as a diffusion model, more so if you see the U-Net diffusion model, which is effectively an information bottleneck. submitted by /u/windoze [link] [comments]  ( 9 min )
    [D] How is this sub not going ballistic over the recent GPT-4 Vision release?
    For a quick disclaimer, I know people on here think the sub is being flooded by people who arent ml engineers/researchers. I have worked at two FAANGS on ml research teams/platforms. My opinion is that GPT-4 Vision/Image processing is out of science fiction. I fed chatgpt an image of a complex sql data base schema, and it converted it to code, then optimized the schema. It understood the arrows pointing between table boxes on the image as relations, and even understand many to one/many to many. I took a picture of random writing on a page, and it did OCR better than has ever been possible. I was able to ask questions that required OCR and a geometrical understanding of the page layout. Where is the hype on here? This is an astounding human breakthrough. I cannot believe how much ML is now obsolete as a result. I cannot believe how many computer science breakthroughs have occurred with this simple model update. Where is the uproar on this sub? Why am I not seeing 500 comments on posts about what you can do with this now? Why are there even post submissions about anything else? submitted by /u/corporate_autist [link] [comments]  ( 9 min )
    [P] vLLM with Mistral 7B guide
    Hey all - vllm==0.2.0 got released a couple of hours ago and I put together some code to get it running with the new Mistral 7B model. Also included are some benchmarks for different input batch sizes with the model (output capped at 200 tokens): Batch size Tokens /s 1 46 10 400 60 1.8k Hope it's useful, let me know if you'd like any more info! Here's the link: https://docs.mystic.ai/docs/mistral-ai-7b-vllm-fast-inference-guide submitted by /u/paulcjh [link] [comments]  ( 9 min )
  • Open

    Bing AI chat messages are being hijacked by ads pushing malware
    Bing AI chat messages are being hijacked by ads pushing malware. Malvertising has made its way to Bing's chatbot/search engine. Cybersecurity researchers observed a malicious ad being offered as part of the Chat-GPT, AI-powered answer to a search query. Malvertising is a practice where hackers trick ad networks into displaying ads that look legitimate but are actually malicious. Microsoft integrated Chat-GPT into Bing earlier this year and started monetizing it. When a user types in a query, they would get a result paired with sponsored links. In this instance, researchers were given a link that redirected them to a malicious site. Threat actors continue to leverage search ads to redirect users to malicious sites hosting malware. Bing Chat serves some of the same ads seen via a traditional Bing query. Source : https://www.techradar.com/pro/security/bing-ai-chat-messages-are-being-hijacked-by-ads-pushing-malware submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Crafting Virtual Worlds With Just Words. How AI Changes 3D World Building Forever.
    submitted by /u/Magic-Fabric [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Meta AI presents Emu, a quality-tuned latent diffusion model for generating highly aesthetic images. Emu significantly outperforms SDXLv1.0 on visual appeal [Paper]. Meta AI researchers present a series of long-context LLMs with context windows of up to 32,768 tokens. LLAMA 2 70B variant surpasses gpt-3.5-turbo-16k’s overall performance on a suite of long-context tasks [Paper]. Abacus AI released a larger 70B version of Giraffe. Giraffe is a family of models that are finetuned from base Llama 2 and have a larger context length of 32K tokens [Details]. Meta announced [Details]: Meta AI - a new AI assistant users can interact with on WhatsApp, Messenger and Instagram. Will also be available on Ray-Ban Meta smart glasses and Quest 3, Meta’s mixed reality h…  ( 12 min )
    I Asked ChatGPT to be my Girlfriend: And it said Yes!
    This is a quick fun project, nothing serious at all, a personalized custom instruction for ChatGPT to make it conversational and maintain character throughout the interaction. If you want to learn more serious and useful prompt engineering techniques head on: r/PromptWizards. In all seriousness, even though this is just for fun, such applied prompt engineering for NPC in games, or online AI companion services can actually be relevant and useful in the future. By initializing this Girlfriend RolePlaying ChatGPT mode, you're not only interacting with an AI but with Sarah, 25, who is keen to explore several layers of a relationship with you. Each command you use brings you a different shade of companionship: /start - Let Sarah introduce herself. /chat - Engages you in a comforting and c…  ( 11 min )
    Exploring Jimmy Apples Claim: "The AGI has been achieved internally" - Detailed Reddit Investigation
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    This week's AI digest
    Planet friendly: Researchers say AI emits up to 1,500 times less CO2 than humans when producing a page of text. Some disagree. Banking: Currently, around 41% of US bank customers are digital only Meta is launching AI chatbots across its apps to retain younger users. Amazon invests $4 Billion in OpenAI competitor Anthropic. Emerging tiger: Nvidia’s CEO bets India will emerge as a major AI market. Regulation: OpenAI CEO Sam Altman advocates for AI regulation despite risks. Suspense: Elon Musk says AI image generation app Midjourney will be releasing “something significant” soon. submitted by /u/unbalanced_mind [link] [comments]  ( 9 min )
    CGPT-4, it is an indisputable fact that Trump waited three hours to end the insurrection violence. What do his supporters guess was his thinking during this time?
    Some posit he was genuinely surprised by the violence and needed time to assess the situation before acting. They argue that the chaotic nature of the events made immediate action complex, given the layers of command and decision-making involved. Others claim he might have been in deliberation with advisors to gauge the scale and implications of intervention, debating the potential backlash from his supporters or the legal ramifications. Another perspective suggests that he might have been contemplating how the events would affect the certification of the Electoral College results, given that his previous legal and political efforts to contest the 2020 election outcome had failed. In this view, he might have been waiting to see if the Congress would be influenced to halt or delay the certification. While some of his supporters may find these explanations plausible, critics argue that the delay represents a dereliction of duty or even tacit support for the violence. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    How to Connect ChatGPT to the Internet (Step-by-Step Guide)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Any "free" ai to turn text to speech?
    I am looking for an ai that will turn the text to speech and be free. submitted by /u/Korti213 [link] [comments]  ( 9 min )
    Looking for some help on a project
    Hey y’all, I’ve been seeing these clips everywhere of AI streamers, and I’ve been searching Everywhere for explanations of how to make one. I believe I understand the concepts, but I’m really at a loss for the avatar text to speech part. I believe I have it ready for collecting questions and getting it to ChatGPT for response/script, but im very stuck at using a photo for an avatar that can mouth the words and not take 3 mins per response. Any help is appreciated, I’ve been at this project for longer than I’d like lmao. The attached video is a random YouTube short for reference submitted by /u/Lipoz69 [link] [comments]  ( 9 min )
    He got Facebook hooked on AI. Now he can't fix its misinformation addiction
    Facebook's addiction to spreading misinformation and hate speech is a result of its AI algorithms. Joaquin Quiñonero Candela, a director of AI at Facebook, was tasked with fixing the problem but was only focused on addressing AI bias. The Responsible AI team failed to make headway against misinformation and hate speech because it never made those problems its main focus. The spread of lies and hate speech on Facebook has only grown, contributing to genocidal campaigns and the promotion of dangerous falsehoods. The algorithms that underpin Facebook's business were designed to maximize engagement, not filter out false or inflammatory content. Source : https://www.technologyreview.com/2021/03/11/1020600/facebook-responsible-ai-misinformation/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Album covers but Morgan Freeman
    submitted by /u/TheGhettoControversy [link] [comments]  ( 9 min )
    Google is expanding its AI-powered search experience to teenagers
    Google's AI-driven search experience, Search Generative Experience (SGE), is now accessible to teenagers between 13-17 in America. Entailments include a conversational mode for searches, which Google believes can help youngsters pose atypical questions to dig deeper. For the latest advancements in AI, look here first. Teen-friendly AI search SGE introduces a conversational mode to Google Search, allowing users to ask questions and follow-ups in a more natural language. To prevent harmful content from surfacing, Google has placed guardrails, providing stronger protections related to illegal and age-gated substances, or bullying. Features and improving AI accuracy Google is rolling out "About this result" to provide users with more context about the displayed content. Google acknowledges and addresses any validation of false or offensive claims by the AI-powered response, ensuring to provide higher quality and more accurate responses. It’s also using large language models to self-critique and rewrite draft responses on sensitive topics based on quality and safety principles. SGE's popularity and future plans Since SGE's introduction, it has found popularity, especially among younger users who prefer a conversational approach. Google plans to expand SGE outside the U.S. to India and Japan and improve its services with support for videos, images, local info, and more. It's also experimenting with ads positioned next to the AI-generated responses. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities
    In this analysis, we use a K-nearest neighbors (KNN) model to conduct crop segmentation, and we compare these results with ground truth imagery on an agricultural region. Our results reveal that the classification from the KNN model is more accurately representative of the state of the current crop field in 2017 than the ground truth classification data from 2015. These results are a testament to the power of Planet’s high-cadence geospatial imagery. Agricultural fields change often, sometimes multiple times a season, and having high-frequency satellite imagery available to observe and analyze this land can provide immense value to our understanding of agricultural land and quickly-changing environments.  ( 15 min )
  • Open

    Innovative Endeavors: Meta Introduces AI-Powered Tools and Smart Glasses
    submitted by /u/Allinhalf [link] [comments]  ( 9 min )
    Pruning a specific dimension in a neural network using L1-norm
    I've been playing around with pruning neural networks. One interesting thing I've found is that pruning the weights with the lowest L1-norm along a specific dimension seems to give better results than simply pruning all of the weights with the lowest L1-norm (which I believe is the standard method; for example this is what torch.nn.utils.prune.l1_unstructured does). Does anyone have an explanation for why this might be, or knows of any research in this area? I'm aware that structured pruning removes entire channels in a specific dimension. But I'm referring to unstructured pruning here, where I remove a subset of the weights along a specific dimension. Admittedly I've only done very limited benchmarking of this. See this repo for my implementation, and some benchmark details. submitted by /u/Neilf79 [link] [comments]  ( 9 min )
    Help understanding ai, Specificaly cnn cause i want to try training a model on mnist data set as my first project
    Hello, so i learnt the very basics of ai and im trying to understand how nn works, this is what i have figured out so far. so if i have a 4x4 image e.g 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 i pass it across a 2x2 kernal e.g 1 1 0 3 ​ and padding it would do ​ dot product of 0 1 1 0 ​ x ​ 1 1 0 3 ​ is 1 ​ ​ and if we do that to all of them we get a new matrix ​ 1 2 4 4 1 3 5 4 4 ​ ​ then we have padding same so this becomes ​ 0 0 0 0 1 2 4 0 4 1 3 0 5 4 4 0 ​ ​ we then turn it into a feature map, basically flatenting it to something like this 0,0,0,0,1,2,4,0,4,1,3,0,5,4,4,0 ​ so the input has 16 features, if we have a layer of 3 nerons that fire with relu activation funciton and each weight is alternating between 1 and 2 for simplicity sake . we would do 0*1 + 0*2 + 0*1 .... 4*2 + 0*1 = 32 so if we are using relu, we would do is 32 > 0? if so we pass 32 to next neuron if not we pass 0? ​ idk the rest, i guess i forgot what uni taught me 😅 ​ hers a diagram i drew, maybe you can help me figure out hte rest, im confused how the output layer works i guess ​ ​ ​ ​ submitted by /u/SaadPaad2003 [link] [comments]  ( 9 min )
    help understanding basics of neural networks, cnn's to be exact
    Hello, so i learnt the very basics of ai and im trying to understand how nn works, this is what i have figured out so far. so if i have a 4x4 image e.g 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 i pass it across a 2x2 kernal e.g 1 1 0 3 ​ and padding it would do ​ dot product of 0 1 1 0 ​ x ​ 1 1 0 3 ​ is 1 ​ ​ and if we do that to all of them we get a new matrix ​ 1 2 4 4 1 3 5 4 4 ​ ​ then we have padding same so this becomes ​ 0 0 0 0 1 2 4 0 4 1 3 0 5 4 4 0 ​ ​ we then turn it into a feature map, basically flatenting it to something like this 0,0,0,0,1,2,4,0,4,1,3,0,5,4,4,0 ​ so the input has 16 features, if we have a layer of 3 nerons that fire with relu activation funciton and each weight is alternating between 1 and 2 for simplicity sake . we would do 0*1 + 0*2 + 0*1 .... 4*2 + 0*1 = 32 so if we are using relu, we would do is 32 > 0? if so we pass 32 to next neuron if not we pass 0? ​ idk the rest, i guess i forgot what uni taught me 😅 ​ hers a diagram i drew, maybe you can help me figure out hte rest, im confused how the output layer works i guess ​ ​ https://preview.redd.it/h07o5y6847rb1.png?width=1859&format=png&auto=webp&s=df1cdf73ea64ff93ac872dfe8248722e8befd31d ​ ​ submitted by /u/WranglerParty5452 [link] [comments]  ( 9 min )
    Adapt GAN
    Hi everyone, Im new to the Neural network and I wanted some advice : I wanted to generate grayscale images with certain properties : - distribution of pixels values, space correlation between pixels, etc... I already know the type of result that I need, but I wanted to know if a neural network especially a GAN was capable to produce images fitting me requirements. I was thinking that maybe I could change the GAN architecture such as : 1)the Real data inputs (normally images feed to discriminator) will simply be the statistical parameters that I am expecting. 2) I'll add a measure of the various statistical parameters on all the synthetic images generated. 3)Finally the discriminator will only based itself on the statistical parameters comparison for weights updates. Does such network make sense ? If so I have trouble finding a way of implementing it but that is an other story. Right know I want to know if this is doable ? If not do you have any alternative suggestion for my issue ? Thanks all for your advice ! submitted by /u/Hectorite [link] [comments]  ( 9 min )
    Why Batch Norm Works
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 9 min )
  • Open

    Meta's Technological Marvel: AI-Powered Tools and Intuitive Smart Glasses
    submitted by /u/Allinhalf [link] [comments]  ( 9 min )
    Why is dyna Q not outperforming Q learning in terms of sample efficiency?
    I coded a dyna Q implementation based on the algorithm given in Sutton's book over here. However, it seems like both are equally sample efficient on the cliff walking environment. Here is my code. These are my results - ​ ​ https://preview.redd.it/z7xwow5hz7rb1.png?width=585&format=png&auto=webp&s=90b33eb4c754e199e9bf15499a78e0f42e05f5d2 The only think that came to my mind was to increase the model sampling rate (`n_iters`). Even after assigning a large value to it, the performance doesn't change. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    How can I config and build MJPC c++ software?
    I'm trying to install and run this open-source project https://github.com/google-deepmind/mujoco_mpc. It's called MJPC, and it's a C++ software that displays a real-time interactive interface. I've cloned the code, installed CMake, and gcc version 13.1.0 to run C++20. I've also installed the CMake Tools and C/C++ extensions in VSCode as instructed. However, I'm not sure what to do next. I have no experience with C++ and software coding, configuring in VSCode, or building it. Please help me if you can, provide detailed guidance. submitted by /u/Nghiattk27 [link] [comments]  ( 9 min )
    LLM Agents for RL envs
    Has anyone here tried using LLM Agents to solve RL environments? I'm curious about your experiences. Considering that performing a single action involves a chain of thoughts, how fast did your experiments go? Please feel free to add any additional comments about this. Cheers! submitted by /u/stinoco [link] [comments]  ( 9 min )
    Shape Formation with Multi-Agent Reinforcement Learning
    Hey everyone, I'm trying to write MARL code with MAPPO policy to train three agents to form a triangle shape. I'm relatively new to RL, having completed the fundamentals, but I'm struggling to come up with suitable resources which can teach me how to implement codes on python. I'd be really greatful if someone could share some insights or useful resources where I can learn to code and implement MARL. submitted by /u/The_One263 [link] [comments]  ( 9 min )
    Shape Formation with Multi-Agent Reinforcement Learning
    Hey everyone, I'm trying to write MARL code with MAPPO policy to train three agents to form a triangle shape. I'm relatively new to RL, having completed the fundamentals, but I'm struggling to come up with suitable resources which can teach me how to implement codes on python. I'd be really greatful if someone could share some insights or useful resources where I can learn to code and implement MARL. submitted by /u/The_One263 [link] [comments]  ( 9 min )
    Curiosity/ Exploration with Rllib
    Hi! I’ve been training a MultiAgentEnv with Curiosity, but I’d like to extend my action space to be a Dictionary. Are there any similar modules I could use instead or is there any way to use Curiosity with a Dictionary consisting of a Box and a Discrete action space. Thank you! submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
  • Open

    Regular solids and Monte Carlo integration
    Monte Carlo integration is not as simple in practice as it is often introduced. A homework problem might as you to integrate a function of two variables by selecting random points from a cube and counting how many of the points fall below the graph of the function. This would indeed give you an estimate […] Regular solids and Monte Carlo integration first appeared on John D. Cook.  ( 6 min )
  • Open

    Who will benefit from AI?
    In campus talk, Daron Acemoglu offers vision of “machine usefulness,” rather than autonomous “intelligence,” to help workers and spread prosperity.  ( 11 min )
  • Open

    Heeding Huang’s Law: Video Shows How Engineers Keep the Speedups Coming
    In a talk, now available online, NVIDIA Chief Scientist Bill Dally describes a tectonic shift in how computer performance gets delivered in a post-Moore’s law era. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems Read article >  ( 6 min )

  • Open

    Cross-platform way to enter Unicode characters
    The previous post describes the hoops I jumped through to enter Unicode characters on a Mac. Here’s a script to run from the command line that will copy Unicode characters to the system clipboard. It runs anywhere the Python module pyperclip runs. #!/usr/bin/env python3 import sys import pyperclip cp = sys.argv[1] ch = eval(f"chr(0x{cp})") print(ch) […] Cross-platform way to enter Unicode characters first appeared on John D. Cook.  ( 5 min )
    Using Unicode on MacOS
    Setting up Unicode on my MacBook took some research, so I’m leaving myself a note here if I need to do it again. Maybe it’ll help someone else too. From the System Settings dialog, go to Keyboard and click the Edit button next to Input Sources. Click on the + sign in the lower left […] Using Unicode on MacOS first appeared on John D. Cook.  ( 5 min )
  • Open

    The Creator (2023) movie discussion
    In theaters now. PG-13. Synopsis from Fandango (mild spoilers) From writer/director Gareth Edwards (“Rogue One,” “Godzilla”) comes an epic sci-fi action thriller set amidst a future war between the human race and the forces of artificial intelligence. Joshua (John David Washington, "Tenet"), a hardened ex-special forces agent grieving the disappearance of his wife (Gemma Chan, "Eternals"), is recruited to hunt down and kill the Creator, the elusive architect of advanced AI who has developed a mysterious weapon with the power to end the war… and mankind itself. Joshua and his team of elite operatives journey across enemy lines, into the dark heart of AI-occupied territory… only to discover the world-ending weapon he’s been instructed to destroy is an AI in the form of a young child (newcomer Madeleine Yuna Voyles). Trailer If there is any other media I should make threads for just let me know- could be video games, television, etc. submitted by /u/jaketocake [link] [comments]  ( 9 min )
    Aryn comes out of stealth to bring GenAI to OpenSearch and data preparation
    Aryn, a team with experience in AWS big data and database services, has come out of stealth and raised $7.5M in series seed funding. Their mission is to bring generative AI to OpenSearch and data preparation. They aim to use generative AI models to process unstructured data for tasks such as information extraction, question-answering, summarization, and content generation. Aryn's conversational search approach empowers users to interact with their unstructured enterprise data. They have developed a conversational search stack consisting of a semantic data preparation system called Sycamore, semantic search with OpenSearch, and conversational capabilities in OpenSearch. Generative AI powers each component of the stack, leading to higher quality answers and ease of use. Developers can quickly build and deploy applications like question-answering, chatbots, and research platforms using Aryn's stack without needing expertise in AI and search. Aryn's stack is 100% open source, making it accessible to developers. Source : https://blog.aryn.ai/aryn-bringing-generative-ai-to-opensearch-and-data-preparation submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Why does this read like someone used chatdev and gave it a marketing agent named clover with access to a reddit account?
    submitted by /u/Lesbianseagullman [link] [comments]  ( 9 min )
    Meta Unfolds a 'Universe of AI' Across Instagram, Facebook, and WhatsApp
    Meta has unveiled colossal AI updates peppered across its platform that would fundamentally alter user experiences on Instagram, Facebook, and WhatsApp, opening up a "universe of AI" solutions. For the latest advancements in AI, look here first. https://preview.redd.it/bl81rlbqp1rb1.png?width=2048&format=png&auto=webp&s=be44b8ebae8f65b53eb82fe2a78b45f19260c452 Spearheading the AI Universe - Meta AI Chatbot The “advanced conversational assistant” is set to enhance Messenger, WhatsApp, and Instagram services and will be incorporated into upcoming Ray-Ban Meta smart glasses and Quest 3. Real-time information capabilities have been bolstered through a partnership with Microsoft Bing, and image generation is powered by a new model, Emu. A Galaxy of AI Personalities Meta rolled out 28 AIs in beta, featuring sterling personas such as Snoop Dogg, Tom Brady, Kendall Jenner, and Naomi Osaka, thus amplifying the interactivity quotient. AI Studio - Empowering Businesses The AI Studio Platform is equipped to enable businesses to build AI chatbots for messaging services on Facebook, Instagram, and Messenger. Also, Meta will provide a sandbox tool in the upcoming year for users to experiment with creating their own AI. Generative AI Stickers - A New Co-creating Experience AI editing tools will allow users to edit images and co-create content with friends. The tool uses Llama 2 and the new image generation model, Emu, to convert text prompts into stickers in seconds. Ray-Ban Smart Glasses with Meta AI The Ray-Ban smart glasses are equipped with Meta AI, allowing users to receive information, incite creativity, and manage the glasses using just their voice. (source) P.S. If you like this kind of analysis, I write a free newsletter with the latest and most impactful news in AI. Professionals from Google, Meta, and OpenAI read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Get a job as a Prompt Engineer - Challenge: generate SAT-Style Multiple Choice Questions.
    One member on r/PromptWizards just applied for a job as a Prompt Engineer in a company, and they tasked him to craft a prompt system that generates high-quality SAT-style multiple-choice questions. Quite a quest, right? Well, stick around, and we'll take a deep dive into the prompt engineering we used to help him. The mission was precise: Write a prompt to yield an SAT-style multiple-choice question that rigorously tests a student's understanding of "Algebraically solving systems of 2 linear equations in 2 variables". The challenge didn't end there; the question produced had to meet the hard/difficult mark set by real SAT questions. Using the OpenAI Playground, we conducted incisive iterations, testing each prompt separately to mitigate any bias from prior outputs. Our approach was: - …  ( 11 min )
    Warner on AI regulation: ‘We probably can't solve it all at once’
    submitted by /u/smo279 [link] [comments]  ( 9 min )
    Courses for more Seniors
    Hello all, What course would you recommend for those of us who are older and already settled in other careers. For example I'm 35 and a manager so I wouldn't need a course to actually design AI or anything. It would be more related to understanding how/where to implement it in an organisation. Any tips? Cheers and merci submitted by /u/JYanezez [link] [comments]  ( 9 min )
    Show-1: Marrying Pixel and Latent Diffusion Models for Efficient and High-Quality Text-to-Video Generation
    A new paper proposes Show-1, a hybrid model that combines pixel and latent diffusion for efficient high-quality text-to-video generation. Both of these approaches have tradeoffs, so researchers at the National University of Singapore tried a hybrid approach combining both, and shared the results in a paper published yesterday. My highlights from the paper: Pixel diffusion excels at low-res video generation precisely aligned with text Latent diffusion acts as efficient upsampling expert from low to high res Chaining the two techniques inherits benefits of both Show-1 achieves strong alignment, quality, and 15x less inference memory The key is using pixel diffusion for the initial low-resolution stage. This retains alignment with text descriptions. Latent diffusion then serves as a super-resolution expert, upsampling efficiently while preserving fidelity. By blending complementary techniques, Show-1 moves past tradeoffs limiting the individual models. More details here. Paper is here (includes links to example generations). submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    What AI makes images that subtle forms a word like this one?
    submitted by /u/samuraiogc [link] [comments]  ( 9 min )
    Getting emotional with LLMs can increase performance by 115% (Case Study)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Question about a small project
    Me and my sister have a small project we are thinking about working on. The idea is basically that we are going to enter the same prompt, separately, into an image generating a.i of some sort (Dalle2 etc) for a period of time and hopefully see the result change. We would probly pick words or frases that are topical and debated. This only works though if the a.i isnt just trained on old data and has active connection to the internet. MY question is therefor, is there an a.i right now that would fit the task? Sorry if the question is dumb or i didnt explain myself clearly! submitted by /u/Mejwynn [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/27/2023
    ODIN integrates Large Language Models (LLMs) into Obsidian using LangChain, allowing you to ask questions about the data stored in your knowledge graph right from the prompt bar.[1] ChatGPT users can now browse internet, OpenAI says.[2] Adobe’s Photoshop on the web launch includes its popular desktop AI tools.[3] The White House plans to introduce a highly anticipated executive order in the coming weeks dealing with artificial intelligence, President Joe Biden said Wednesday.[4] Sources: [1] https://github.com/memgraph/odin [2] https://www.reuters.com/technology/openai-says-chatgpt-can-now-browse-internet-2023-09-27/ [3] https://www.theverge.com/2023/9/27/23892889/adobe-photoshop-for-the-web-firefly-ai-generative-fill-full-release-price-date [4] https://www.cnn.com/2023/09/27/tech/joe-biden-executive-order-artificial-intelligence/index.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Multimodal AI's will cause people to embrace their own reality bubbles and that is bad news for dictatorships
    I have been messing with llama. Trying to make a script to make a movie. Sort of realize it is not there yet, it can write decoherent long stories / what ever you want. You can couple it with stable diffusion to make images that would have to be described better to fit the "movie" or narrative. It is not there yet, ChatGPT can already do this, you can ask it to tell you a story and describe the visual scenes. At the same time, we are getting audio generation from things like audioldm2 and stableaudio etc. Multimodal AI's are almost here. Pretty soon we will have devices in our pockets powered by AI chips that will be able to generate what ever reality we want. We can feed them images from our past and they can allow us to live in VR reality of the past. Or we can choose to live in anot…  ( 10 min )
    Jazz Fusion (AI Generated DnB & Jazz music and video)
    submitted by /u/LibeerCZ [link] [comments]  ( 9 min )
  • Open

    Modern reinforcement learning for video game NPCs
    submitted by /u/akliyen [link] [comments]  ( 9 min )
    Reinforcement learning in automating game testing
    The role of Reinforcement learning in automating game testing is becoming increasingly crucial, making it more efficient and effective. Manual testing, while essential, is extremely time-consuming and subject to human error. Our opensource library SheepRL 🐑 can be used to test whether the game dynamics is well defined: what if a player can finish the game with just a few moves? 🎮 This video shows that our agent (Kasumi, on the left) is able to win the game in the hardest modality by standing down and throwing kicks. 🥊 This can be helpful for a game developer to: ​ understand where and how intervene to achieve a more playful game predict and correct bugs early in the game development process enhance the gaming experience and final product quality reduce time and resources spent on debugging. The game has changed 🔥 and it is up to us to play it with (human + artificial) intelligence! Thanks to u/DIAMBRA_AIArena for the video! --- ❌ Are you interested in joining the project community? Get in touch ❌ SheepRL 🐑 is open-source, fully written in PyTorch and accelerated with LightningFabric - by Lightning AI Feel free to use it for your Artificial Intelligence projects, and if you want to contribute, we are more than happy to accept your pull requests! ❤️ https://reddit.com/link/16uht6v/video/ve3derxsc0rb1/player submitted by /u/Manu_Orobix [link] [comments]  ( 9 min )
    Proofs in the original Q-Learning technical notes
    I'm not sure it's the right place for this, but I was going through the proofs in the "original" 1992 technical notes of Q-learning, and a couple of points raised some questions: 1) In the Proof of lemma B.4: https://preview.redd.it/7g6pputdqwqb1.png?width=1006&format=png&auto=webp&s=fe4afeac3b06deee6c80105b280a0085bdcfbe51 where do P_{xy}^2(a_2) and R_x(a_2) come from? If we apply the definitions of Q'(x, a_1, a_2) and Q(x, a_1, a_2) to get the bound, P_{xy}^2(a_2) and R_x(a_2) should not be there. Are they just notation errors or is it correct and I'm missing something? ​ 2) I don't quite get how the bounds on P and R are computed in Section 3.2: https://preview.redd.it/p06ysjewqwqb1.png?width=962&format=png&auto=webp&s=a5929e701099dc6e4543efe7681f96f12f543fa8 Considering the results in B.4 (i.e., the bounds for the distance between the chain's P, R and the real ones), I don't understand how they arrive at this conclusion. ​ I'd greatly appreciate any intuitions about these, or if someone can point me in the right direction :) submitted by /u/Beautiful_Zebra_198 [link] [comments]  ( 9 min )
  • Open

    [N] We Collaborated with Outerbounds to Enable HPC and Ray Integration in Metaflow
    Here is our blog post - please check it out: https://forums.autodesk.com/t5/engineering-hub-blog/autodesk-and-outerbounds-partner-to-open-source-ray-and-hpc/ba-p/12254816 And try out the metaflow-ray extension here: https://github.com/outerbounds/metaflow-ray submitted by /u/rirhun [link] [comments]  ( 9 min )
    [D] What are the options for the most human TTS?
    So I’ve been using elevenlabs but it burns through characters really fast. What are the best options for the most human sounding TTS available? I’ve been looking into tortoise, but would like to see if there are other options I should be looking into. submitted by /u/Long8D [link] [comments]  ( 9 min )
    [D] How do we know Closed source released benchmarks aren't being heavily optimized, through outside means?
    I've recently started working with ML and NLP, so I'm sorry if this sounds Naive. Unlike Llama 2 or other open source, we don't have access to the model weights for GPT-4, Claude or Bard, so Benchmark Evals are being run through either APIs or the chat Interface. So how do we know that the model isn't being Boosted by custom web-searching abilities or RAG? While GPT-4 might have a turnoff option, I'm pretty sure Bard is always online, being built by google. So how do we trust benchmarks? Also, have any opensource been tested after Websearch/RAG? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [R] Searching for a regression dataset with structure in its prediction
    I am searching for a relatively simple dataset to train a regressor that has some structure in its predictions. Can't be too tiny cause I have to try out a NN architecture. It must have at least some continuous feature but can also have additional categorical or related discrete structures. I usually work with vision tasks, so I am not sure if I miss something obvious I could try? Open for ideas! I thoughts about predicting rows in some tabular dataset? Anything suitable that comes to mind? submitted by /u/LeanderKu [link] [comments]  ( 9 min )
    [N] CUDA Architect and Cofounder of MLPerf: AMD's ROCM has achieved software parity with CUDA
    Greg Diamos, the CTO of startup Lamini, was an early CUDA architect at NVIDIA and later cofounded MLPerf. He asserts that AMD's ROCM has "achieved software parity" with CUDA for LLMs. Lamini, focused on tuning LLM's for corporate and institutional users, has decided to go all-in with AMD Instict GPU's. https://www.crn.com/news/components-peripherals/llm-startup-embraces-amd-gpus-says-rocm-has-parity-with-nvidia-s-cuda-platform submitted by /u/makmanred [link] [comments]  ( 9 min )
    [P] Request to test Mirage: A platform to search and generate images, videos, audio, and 3D assets using natural language
    Mirage is the infinite asset library that helps you find or create the perfect digital asset. 🗨️ Just Search Naturally: No awkward keywords—Mirage understands you. 🤖 State-of-the-Art Models: Can't find it? Generate it, thanks to open-source models. 🔍 Similarity Search: Discover more of what you love with a single click. 🤗 Fully Personalized: Our AI librarian learns your style to show you assets you'll dig. Website Link: MirageML Open-Source Github: Github Development Status: Beta I would love to get some honest feedback! submitted by /u/perception-eng [link] [comments]  ( 9 min )
    [P] Request to test Domeis: A new platform for no-code Machine Learning
    Domeis is a no-code Machine Learning platform that offers a dashboard to design, train and test Machine Learning algorithms, as well as to import, pre-process and cleanse data, all from the Graphical User Interface and without writing a single line of code. The aim of this platform is two-fold: Making Machine Learning accessible to anyone and not just Data Scientists or experienced software developers. By offering the possibility to design, train and test Machine Learning models directly via GUI, being an experienced software developer is no longer a pre-condition for creating ML models Making Machine Learning model creation, training and testing faster for experienced Data Scientists / Machine Learning Engineers. By drastically reducing the time needed to set up environments, import data and define models, Domeis allows Machine Learning practitioners to focus on trying out and compare different models/approaches. Website Link: https://www.domeis.it/ Development Status: Alpha I would love to get some honest feedback! submitted by /u/Ok_Hold_5385 [link] [comments]  ( 9 min )
    [D] Help understanding convergence proof (Adaptive learning rate + Momentum)
    Hello everyone, I am trying to understand the convergence analysis/derivation of the momentum algorithm, or the stochastic heavy ball algorithm, using the regret bound analysis from different research papers. https://ieeexplore.ieee.org/document/7330562 - Page3 https://www.mdpi.com/2504-3110/6/12/709 - Page6 http://arxiv.org/abs/1707.01647 - Page4 ​ In the derivation, there is the following simplification, which I do not understand at all ​ $\frac{2\boldsymbol{\eta}_{k}}{(1-\beta)}\sum_{k=0}^{T}\left|J(\theta_k) - J(\theta^*)\right| + \frac{2\boldsymbol{\eta}_{k}\beta}{(1-\beta)^2} \sum_{k=0}^{T}\left|J(\theta_k) - J(\theta_{k-1})\right| \leq \ \left|\boldsymbol{\theta}_{0} + \boldsymbol{p}_{0} - \boldsymbol{\theta}^* \right|^2 - \left|\boldsymbol{\theta}_{T+1} + \boldsymbol…  ( 9 min )
    [D]Any researchers or institutions in USA that follows Ai-compression relationships specifically like deepmind
    I have tried to follow the main collaborators of Hutter and other prominent scientists to track this, but they are mostly in Europe with some in Australia. American institutions seems to be more interested in more open ai like deep neural networks. If anyone is familiar with any US based institutions that does notable work in this line,please comment submitted by /u/Netero1999 [link] [comments]  ( 9 min )
    [R] Brain Tumor segmentation
    Can any of you suggest me computer science research ideas related to brain tumor segmentation using UNet. submitted by /u/Eleonora467 [link] [comments]  ( 9 min )
    [P] BionicGPT - ChatGPT replacement that let's you run R.A.G on confidential data
    BionicGPT is an open source WebUI that gives enterprises the ability to run Retrieval Augmented Generation (RAG) on their on premise documents. To allow people to get up to speed we deploy with a quantized 7B model that runs on CPU. Github Repo: https://github.com/purton-tech/bionicgpt We basically implement a RAG pipeline including document upload, embeddings generation and subsequent retrieval. Feedback: We'd love to get some feedback in the form or github issues or comments here. Screenshot: https://preview.redd.it/uiw0wqul30rb1.png?width=2447&format=png&auto=webp&s=8ad7e61ed048258c19aa63bf7c94d12da5b721fa submitted by /u/purton_i [link] [comments]  ( 9 min )
    [N] First Impressions with GPT-4V(ision)
    My colleague Piotr and I have been testing GPT-4V(ision) over the last day. We wrote up our findings, covering how GPT-4V performs on: Visual question answering (VQA) across a range of domains (locations, movies, plants) OCR Math OCR Object detection And more TL;DR: GPT-4V performed well for VQA and document OCR but struggled with OCR on real-world images and object detection (where we asked for bounding boxes). https://blog.roboflow.com/gpt-4-vision/ I would love to hear what other people have found working with GPT-4V. submitted by /u/zerojames_ [link] [comments]  ( 9 min )
    [R] NUS: Results of Combining Pixel and Latent Diffusion Models for Text-to-Video Generation
    A new paper proposes Show-1, a hybrid model that combines pixel and latent diffusion for efficient high-quality text-to-video generation. Both of these approaches have tradeoffs, so researchers at the National University of Singapore tried a hybrid approach combining both, and shared the results in a paper published yesterday. My highlights from the paper: Pixel diffusion excels at low-res video generation precisely aligned with text Latent diffusion acts as efficient upsampling expert from low to high res Chaining the two techniques inherits benefits of both Show-1 achieves strong alignment, quality, and 15x less inference memory The key is using pixel diffusion for the initial low-resolution stage. This retains alignment with text descriptions. Latent diffusion then serves as a super-resolution expert, upsampling efficiently while preserving fidelity. By blending complementary techniques, Show-1 moves past tradeoffs limiting the individual models. More details here. Paper is here (includes links to example generations). submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Linear Regression Queries [D]
    I am a beginner in Data Science. I have recently enrolled in the supervised machine learning algorithm by Andrew Ng in Coursera. I am now familiarized with linear regression, gradient descent. However, I faced a certain issue. In the optional lab, there was a task to calculate the value of the cost function using gradient descent for linear regression. I wrote the code in my notebook by myself and cross checked it to be correct. However, the desired output of w,b are very much different but the cost function yields a better result in my code. Another factor, I noticed that have to scale only thex variables, leaving the values of y. I have two major queries now: Is the yielding of different w,b values fine as long as the cost function is minimum? (w is a numpy array) Why do scale the x variables only? Why don't scale the y variables? Thanks in advance. submitted by /u/healing_you [link] [comments]  ( 9 min )
    [P] Hands-on open-source workflows for voice AI
    Hey r/MachineLearning, we made a tutorial that showcases typical workflows and tooling for voice analytics applications. The tutorial is intended for intermediate-level ML practitioners. The walkthrough is purely based on open source software and covers: Efficient interactive data exploration and inspection Dataset handling and inference on pre-trained models Model debugging and identification of critical data clusters Model comparison and selection ​ https://i.redd.it/j15gk3kkgyqb1.gif 🔗 Blog with code: https://medium.com/p/dbfd923a5a79#432e-3559ae606f80 🤗 Interactive demo: https://huggingface.co/spaces/renumics/emodb-model-debugging ​ ​ submitted by /u/44sps [link] [comments]  ( 9 min )
    [D] CV annotations and work with COCO/YOLO dataset
    Hi everyone. In my job I work with a lot of data for Computer Vision, and I use Label Studio for annotations. But the last time I've worked with it, I lost some of my annotations, which I need for other purposes. I have the final result as a YOLO and COCO dataset, but I cannot import the results from them to recover all I need. Can you suggest me good applications with an intuitive UI to import the COCO or YOLO dataset and work with labels? submitted by /u/thattallsoldier [link] [comments]  ( 9 min )
    [P] Request to Test PyMilo: A New Python Library for Machine Learning I/O
    Pymilo is an open-source Python package that offers an efficient, safe, and transparent method for transporting pre-trained machine-learning models. The motivation for developing this package is to eliminate the risks of binary or pickle formats. As this library is still in its early stages of development, it currently supports only a limited number of machine learning models provided by Scikit-learn. Nevertheless, it will be precious if the community utilizes this library and provides us with their feedback about improving the package's interface and prioritizing future developments. Your cooperation would be invaluable to us. In the following, I provide an example of how to utilize PyMilo. GitHub Repo: https://github.com/openscilab/pymilo Development Status: Alpha Simple Linear Mode…  ( 9 min )
    [Discussion] Interesting interview question
    Was asked something similar to the following question in an interview for a ML role and was curious how others would answer this: Say you have a dataset with one feature column and one label column (with different classes). Assume this data is too large to fit into memory and could be infinite in size (e.g data is coming in as a stream). How would you train a ML model on this data to accurately predict the label? Followup: instead of one feature column, what if you had several thousand? How would you decide which features to use given the size of the dataset? I discussed online sampling (resevoir sampling, etc) as a way to get a training dataset that could fit in memory + continually train on that but the interviewer did not seem convinced. Any thoughts? submitted by /u/scpdstudent [link] [comments]  ( 9 min )
    [D] What appropriate loss function to use for "Search recall" optimization?
    I'm studying the application of ML to improve searches. Here's a couple of example scenarios: Document retrieval (search) system: We have a (source) document with us and we're trying to find a matching document in a database. The source document has text and image attributes - for simplicity let's say a title and a single image. Each search result will also be a document - with a title and at most one image. A search engine: We have a query comprised of both text and an image (like google image search allows text to be added to the query as well). Each search result will be a website with text and image attributes (for simplicity, webpage title and at most one image) More generally, I have a search system - whatever we're trying to search for has text and an image associated with it…  ( 11 min )
    [D] How Does Your Organization Approach Machine Learning Projects Phase by Phase?
    How does the development process of a Machine Learning project unfold phase-by-phase within your organization? Could you please specify the type of organization you are, the time spent on each phase, as well as any aspects you consider to be weak or fundamental? It would also be great if you could share any tips or tricks you've learned that have changed your perspective. submitted by /u/Spiritual_Narwhal649 [link] [comments]  ( 9 min )
    [P] Rubik's Cube Square Detection
    Hello everyone, I am trying to detect the 9 squares of a face of a Rubik’s Cube through a camera. The idea is that I want to use my computer camera and tell the user to show all the Rubik’s Cube faces and read the faces so I can feed it to a solver. Here are the steps I have tried so far: Sharpened square edges Obtained binary image and removed noise Detected and extracted squares Some methods I used were using different blurs and cv functions but nothing worked. Sometimes, it can get all 9 squares but sometimes it doesn't. There also seems to be a difference for different colors; for example; the model can detect green squares easier than yellow squares. Can anyone provide advice as to how I can detect the squares on the face? ​ https://preview.redd.it/1ht9f4h31wqb1.png?width=2180&format=png&auto=webp&s=32d23515a43406c0f8828e6790ad71e754b0ab80 submitted by /u/uglyboi34 [link] [comments]  ( 9 min )
  • Open

    DynIBaR: Space-time view synthesis from videos of dynamic scenes
    Posted by Zhengqi Li and Noah Snavely, Research Scientists, Google Research A mobile phone’s camera is a powerful tool for capturing everyday moments. However, capturing a dynamic scene using a single camera is fundamentally limited. For instance, if we wanted to adjust the camera motion or timing of a recorded video (e.g., to freeze time while sweeping the camera around to highlight a dramatic moment), we would typically need an expensive Hollywood setup with a synchronized camera rig. Would it be possible to achieve similar effects solely from a video captured using a mobile phone’s camera, without a Hollywood budget? In “DynIBaR: Neural Dynamic Image-Based Rendering”, a best paper honorable mention at CVPR 2023, we describe a new method that generates photorealistic free-viewp…  ( 92 min )
    Re-weighted gradient descent via distributionally robust optimization
    Ramnath Kumar, Pre-Doctoral Researcher, and Arun Sai Suggala, Research Scientist, Google Research Deep neural networks (DNNs) have become essential for solving a wide range of tasks, from standard supervised learning (image classification using ViT) to meta-learning. The most commonly-used paradigm for learning DNNs is empirical risk minimization (ERM), which aims to identify a network that minimizes the average loss on training data points. Several algorithms, including stochastic gradient descent (SGD), Adam, and Adagrad, have been proposed for solving ERM. However, a drawback of ERM is that it weights all the samples equally, often ignoring the rare and more difficult samples, and focusing on the easier and abundant samples. This leads to suboptimal performance on unseen data, espe…  ( 92 min )
  • Open

    Accenture creates a Knowledge Assist solution using generative AI services on AWS
    This post is co-written with Ilan Geller and Shuyu Yang from Accenture. Enterprises today face major challenges when it comes to using their information and knowledge bases for both internal and external business operations. With constantly evolving operations, processes, policies, and compliance requirements, it can be extremely difficult for employees and customers to stay up […]  ( 8 min )
    Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs
    We’re excited to announce that Amazon SageMaker Canvas now offers a quicker and more user-friendly way to create machine learning models for time-series forecasting. SageMaker Canvas is a visual point-and-click service that enables business analysts to generate accurate machine learning (ML) models without requiring any machine learning experience or having to write a single line of code. SageMaker […]  ( 7 min )
    Robust time series forecasting with MLOps on Amazon SageMaker
    In the world of data-driven decision-making, time series forecasting is key in enabling businesses to use historical data patterns to anticipate future outcomes. Whether you are working in asset risk management, trading, weather prediction, energy demand forecasting, vital sign monitoring, or traffic analysis, the ability to forecast accurately is crucial for success. In these applications, […]  ( 10 min )
    Create a Generative AI Gateway to allow secure and compliant consumption of foundation models
    In the rapidly evolving world of AI and machine learning (ML), foundation models (FMs) have shown tremendous potential for driving innovation and unlocking new use cases. However, as organizations increasingly harness the power of FMs, concerns surrounding data privacy, security, added cost, and compliance have become paramount. Regulated and compliance-oriented industries, such as financial services, […]  ( 13 min )
    Beyond forecasting: The delicate balance of serving customers and growing your business
    Companies use time series forecasting to make core planning decisions that help them navigate through uncertain futures. This post is meant to address supply chain stakeholders, who share a common need of determining how many finished goods are needed over a mixed variety of planning time horizons. In addition to planning how many units of […]  ( 11 min )
    Announcing New Tools to Help Every Business Embrace Generative AI
    From startups to enterprises, organizations of all sizes are getting started with generative AI. They want to capitalize on generative AI and translate the momentum from betas, prototypes, and demos into real-world productivity gains and innovations. But what do organizations need to bring generative AI into the enterprise and make it real? When we talk […]  ( 13 min )
  • Open

    How will the Big Data market evolve in the future?
    Big data has been around for some time now, becoming a more or less common concept in business. However, recent developments in AI technology have shaken up an already volatile field, inviting us to reconsider our projections of how the big data market will look in the future. We can already see the signs that… Read More »How will the Big Data market evolve in the future? The post How will the Big Data market evolve in the future? appeared first on Data Science Central.  ( 22 min )
  • Open

    Kicking Games Up a Notch: Startup Sports Vision AI to Broadcast Athletics Across the Globe
    Pixellot is scoring with vision AI — making it easier for organizations to deliver real-time sports broadcasting and analytics to viewers across the globe. A member of the NVIDIA Metropolis vision AI partner ecosystem, the company based near Tel Aviv offers an AI-powered platform that automates the capturing, streaming and analysis of sporting events. It’s Read article >  ( 7 min )
    V for Victory: ‘Cyberpunk 2077: Phantom Liberty’ Comes to GeForce NOW
    The wait is over. GeForce NOW Ultimate members can experience Cyberpunk 2077: Phantom Liberty on GOG.com at full GeForce RTX 4080 quality, with support for NVIDIA DLSS 3.5 technology. It’s part of an action-packed GFN Thursday, with 26 more games joining the cloud gaming platform’s library, including Quake II from id Software. A New Look Read article >  ( 8 min )
  • Open

    AI Frontiers: Measuring and mitigating harms with Hanna Wallach
    Powerful large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come.    In this Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these models—and the […] The post AI Frontiers: Measuring and mitigating harms with Hanna Wallach appeared first on Microsoft Research.  ( 29 min )

  • Open

    How can AI recreate the lack of information?
    Hey there! Are there guys here who possess a strong grasp of AI neural network logic? ​ I've extracted a character from an anime scene using a mask, and saved it as a PNG sequence which contains solely the anime character along with an alpha (transparent) background. ​ I'm curious about how the Flowframes neural network can recreate the background that was originally behind the character but removed by the mask. It's impossible since the PNG images don't have that background info. ​ Can anyone explain how this works? ​ Attachments: - Image #1: https://preview.redd.it/z2bypfkstvqb1.png?width=1920&format=png&auto=webp&s=c534167c5ae4129c04f9b8b2fbca3bac350a1d4a - Image #2: https://preview.redd.it/x5kkzs2ttvqb1.png?width=1920&format=png&auto=webp&s=6838d7ca5e1e4f19ba46c04750fdaea537a787f0 (Don't mind the black background in the thumbnails, it's a bug, there's actually a transparent background) ​ * Flowframes is a app that utilizes advanced AI frameworks to interpolate videos in order to increase their framerate in the most natural looking way possible. submitted by /u/drkysqrl [link] [comments]  ( 9 min )
    (Pt. 2) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Graph Feature vector (embedding) [D]
    Hey all, I’m trying to do a regression algorithm for a dataset where I have a graph for each patient I have representing a location in their brain from MRI images. Right now, I don’t have a ton of data, so I’m looking for some way to take each graph I have and get a feature vector for it to put into a regression algorithm. So for 100 patients, I have 100 graphs, I’d like to have 100 feature vectors representing each patients graph. My issue is trying to find some algorithm that takes in the entire graph and outputs a single feature vector. I’ve been looking at some libraries but they all seem wildly scattered. I don’t want to grab a bunch of nose embeddings and do some elementary merge of them, like an average or sum, etc. Any help in pointing me to some Python libraries that can help me do this, or algorithms, or anything. Thank you so much. submitted by /u/kaleb7589 [link] [comments]  ( 9 min )
    Normalization in VAE[D]
    Normalization in VAE[D] Am training a variational auto encoder. First I tried with batch normalization before I send the data to the network and someone probably wisely pointed out that it's not correct. If I don't use batch norm then my training fails due to numerical instability. I then tried scaling my data before hand using standard scaler from sklewrn. And now my training works. Is this reasonable? Any other thoughts? submitted by /u/Global-Gene2392 [link] [comments]  ( 9 min )
    [P] Predicted stock data with TensorFlow is very different from actual data
    I'm following a YouTube video to create a simple machine learning model to predict stock prices. I have to reshape my prediction data so it works with inverse_transform, but in the video he doesn't do this. If I don't reshape it I get an error, but I think when I do reshape it it messes with the data. The predicted values are all very similar. I've tried messing with epoch and batch sizes, and changing other metrics like prediction_days, but nothing has worked. This is what the prediction data looks like when plotted, and this is what it looks like when printed. Does anyone know what could be causing this? Here's my code submitted by /u/darkshadowtrail [link] [comments]  ( 9 min )
    New subreddit rule idea [D]
    This subreddit will continue to die if it doesn't foster discussion of the latest research and reduce low-quality posts. However, making a judgement as to what is or is not low-quality is time-consuming and subjective -- not something the mods should be doing. To this end, I had the following new rule idea: If it's your first time at Fight If it's your first post in this subreddit, it needs to be a link to arxiv (Or, more generally, the number of your non-arxiv posts cannot exceed the number of your arxiv posts) All arxiv posts must be standard links to the abstract page (to catch reposts and to connect discussions of the same paper in different subreddits) An arxiv post must be a paper you've read yourself, and you should post a comment describing what you liked and DIDN'T like about it (Let the airing of grievances begin! I think this will help seed the discussion, which is really the raison d'être of this subreddit) If the post or the comment get downvoted, they do not count. What do you think? Will this help steer this subreddit in the right direction? Is this enforceable? submitted by /u/we_are_mammals [link] [comments]  ( 9 min )
    [D] How feasible is it to complete a course.
    Hi I am a physicist (1st year in masters) and I decided to take NN class (for cs students). I have a decent experience with python but I have never done low level coding. The class project requires a C++ implementation of NN with back propagation algorithm. I am quite confident in my learning ability, nonetheless, do you guys think it is feasible for me to code such a project in 13 weeks (I also have other subjects and cant just spend all my time on this)? Thanks submitted by /u/merimace [link] [comments]  ( 9 min )
    [P][D] Need Guidance on Building a Chatbot like ChatGPT for University Data - Help!
    Hey fellow Redditors, I find myself in quite a situation and could use some guidance. Recently, I introduced my professor to privateGPT and demonstrated its capabilities using a small set of college data. To my delight, he was impressed and has now tasked me with researching and developing a ChatGPT-like chatbot, but with access to our university's extensive data. Here's where I need your help: my professor wants this chatbot to be hosted on our university's systems due to privacy concerns, which means I can't use ChatGPT's API. I've been given access to Sol HPC, but I'm finding it quite confusing to get started. I'm looking for advice, tips, or any resources that can help me embark on this journey. Has anyone here worked on a similar project, or does anyone have experience with Sol HPC or building chatbots with local data sources? Any guidance or insights would be greatly appreciated! Thank you in advance for your help. This project means a lot to me, and I want to make sure I'm heading in the right direction. submitted by /u/ssshankyyy [link] [comments]  ( 9 min )
    [R] UNC Researchers Present VideoDirectorGPT: Using AI to Generate Multi-Scene Videos from Text
    Generating coherent videos spanning multiple scenes from text descriptions poses unique challenges for AI. While recent progress enables creating short clips, smoothly transitioning across diverse events and maintaining continuity remains difficult. A new paper from UNC Chapel Hill proposes VIDEODIRECTORGPT, a two-stage framework attempting to address multi-scene video generation: Here are my highlights from the paper: Two-stage approach: first a language model generates detailed "video plan", then a video generation module renders scenes based on the plan Video plan contains multi-scene descriptions, entities/layouts, backgrounds, consistency groupings - guides downstream video generation Video generation module called Layout2Vid trained on images, adds spatial layout control and cross-scene consistency to existing text-to-video model Experiments show improved object layout/control in single-scene videos vs baselines Multi-scene videos display higher object consistency across scenes compared to baselines Competitive open-domain video generation performance maintained The key innovation seems to be using a large language model to plot detailed video plans to guide overall video generation. And the video generator Layout2Vid adds better spatial and temporal control through some clever tweaks. The separation of these tasks seems to matter. You can read my full summary here. There's a link to the repo there too. Paper link is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    survival analysis in matlab [project]
    survival analysis in matlab hi everyone one i'm doing a predictive algorithm to find DFS using Cox regression, i first used LASSO regression to select the predictive variables, now i'm using the c-index to evaluate the predictive accuracy, and it's always equals to 1 and I can't understand why(I tried to reduce the numbers of variables just to see if it could change but it didn't change).Also, i'm working on censored date of course. can someone help me understand what I'm doing wrong? submitted by /u/bl4s3159 [link] [comments]  ( 9 min )
    [R] Can you help me validate my kmeans calculator for tensorflow faster rcnn model config ?
    My annotations are in pascal voc format. Below is a calculator I am testing . Not sure if I am calculating the scale and aspect ratios correctly. Please help. import os import xml.etree.ElementTree as ET import numpy as np from sklearn.cluster import KMeans def compute_scales_and_aspect_ratios(directory, n_clusters, img_size): widths = [] heights = [] for filename in os.listdir(directory): if not filename.endswith('.xml'): continue fullname = os.path.join(directory, filename) tree = ET.parse(fullname) root = tree.getroot() for obj in root.iter('object'): xmlbox = obj.find('bndbox') w = float(xmlbox.find('xmax').text) - float(xmlbox.find('xmin').text) h = float(xmlbox.find('ymax').text) - float(xmlbox.find('ymin').text) widths.append(w) heights.append(h) widths = np.array(widths) / img_size[1] # Normalize by image width heights = np.array(heights) / img_size[0] # Normalize by image height scales = np.sqrt(widths * heights).reshape(-1, 1) aspect_ratios = (widths / heights).reshape(-1, 1) kmeans_scales = KMeans(n_clusters=n_clusters, random_state=0).fit(scales) kmeans_aspect_ratios = KMeans(n_clusters=n_clusters, random_state=0).fit(aspect_ratios) return kmeans_scales.cluster_centers_, kmeans_aspect_ratios.cluster_centers_ directory = "path_to_top_folder/batch-1" n_clusters = 5 img_size = (640, 1024) scales, aspect_ratios = compute_scales_and_aspect_ratios(directory, n_clusters, img_size) print('Scales:', scales.flatten()) print('Aspect Ratios:', aspect_ratios.flatten()) ​ submitted by /u/dpadhy [link] [comments]  ( 9 min )
    [P] Any available datasets of children’s books or stories?
    I am looking for training data consisting of children’s stories and associated grade level. Does anyone know of any publicly available or paid datasets like this? submitted by /u/SpellboundLRN [link] [comments]  ( 9 min )
    Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
    submitted by /u/cegras [link] [comments]  ( 9 min )
    [P] Tetris AI - Suggestions on direction to take from here? (One hot encoded dataset with 200 features)
    Hello! I'm working on a Tetris AI and am representing the 10x20 grid cubes with a one hot encoded dataset: https://www.kaggle.com/datasets/conlan/tetris-training-set-9262023 This means my data has 208 features (200 for the grid cubes being on/off, 7 for the "next shape" box, and 1 for the labeled best move. I currently have 9460 labeled samples and have done some preliminary fitting here: https://www.kaggle.com/code/conlan/tetris-ai?scriptVersionId=144388350 with a highest f1_macro score of 0.431090. Does anyone have suggestions for which direction to take from here to improve? Currently I see: Collect More Data Tune Hyperparameters Rework Features I'm hesitant to rework the features as that would require telling the model more specifics and would like to keep it abstract, but maybe 200 is crazy high? Or maybe <10k samples is too low and I should just keep collecting data? Thanks in advance! submitted by /u/conlanrios [link] [comments]  ( 9 min )
    [R] The Internal State of an LLM Knows When its Lying
    Paper - https://arxiv.org/abs/2304.13734 submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [D] Feature Transformation & Scaling
    ood morning everyone, I am currently reading the book of Mr. Burkov: Machine Learning Engineering. He talk about a step that might be helpful before training a ML model: Feature Scaling. Furthermore, he adds that before Feature Scaling, you might do Feature Transformation (Log, Square,...) in order to make your data look normal and have better models. How true do you think this statement is? Do you also transform your feature, and then scale them? How often do you do it? It is important for Regression or SVM, but do you do it also for other black box algorithms such as Random Forests? What are the best practices according to you? submitted by /u/dekozr [link] [comments]  ( 9 min )
    AAAI 24 [Discussion]
    So no discussions are going on about AAAI 2024, or have I just been unable to find any? Opening this regarding Phase 1-2 and Results discussions if anyone wants to discuss. If there already is a thread, share! For an opening question, any idea about what percentages are rejected in desk rejection, phase 1 and finally phase 2? (Roughly of course) submitted by /u/atharvandogra [link] [comments]  ( 9 min )
    [D] GPT2 diagrams are wrong
    so if u go check the source code for gpt2 u can clearly see that the nrom happens inside the attention and mlp layers. and that the add is separate. this is in the official openai github and is relatively easy to read:https://github.com/openai/gpt-2/blob/master/src/model.py#L123-L130 (thx KingsmanVince) ​ for some reason all the online materials are saying that there is a full norm layer before the mlp instead of inside of it submitted by /u/rejectedlesbian [link] [comments]  ( 9 min )
    [D] ONNX or torchlib for on device training in C++
    Hi, Recently I am trying to reimplement a deep learning based object tracking in C++. However, the whole pipeline involve online training and weight update. Is it possible to do the training for ONNX model and using cuda as accelerator in C++? If yes, then how is the training speed (BP/update)compare to torchlib? I personally strongly prefer onnx, cuz it is much easier to deploy… submitted by /u/Independent_Bet1268 [link] [comments]  ( 9 min )
    [D] The quality of this sub
    Mods finally commented The only time that mods were active is when they removed the cat meme. It has been a month since that. Let see what mods have done to improve this sub. Here are some of the other posts obviously rule-breaking or off-topic that mods do NOT remove: A person asking for help with their motherboard A person asking about statistics A person asking for machine learning roadmap Another asking-for-roadmap post ... the list goes on with absolute beginner questions, and low-quality posts. All these posts were written in less than 1 week. As we can see, mods do nothing. They only remove posts that calling out them. Here are posts that people discuss the status of this sub: 17 Sep 2023 2 Sep 2023 1 Aug 2023 Questions for mods: where are you when people complain? Why do you only show up when someone call you out? We have few options: Report the mods and the sub for unmoderated (see this 1 and this 2) Find other communities Gatekeep harder, tell people to go to r/learnmachinelearning r/MLQuestions r/cscareerquestions r/languagetechnology submitted by /u/March8365 [link] [comments]  ( 9 min )
    [D] Model release v0.1 from MistralAI
    EDIT: They released the model weights on HF (https://huggingface.co/mistralai) under a Apache 2.0 License. They also updated their website with documentation on how to use/run : https://docs.mistral.ai Note: I am not affiliated with Mistral AI. ​ Via their Twitter X account : magnet:?xt=urn:btih:208b101a0f51514ecf285885a8b0f6fb1a1e4d7d&dn=mistral-7B-v0.1&tr=udp%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=https%3A%2F%http://2Ftracker1.520.jp%3A443%2Fannounce ​ https://preview.redd.it/0o46ls925rqb1.png?width=1306&format=png&auto=webp&s=7ff7ca3a510577e9ecdaa3c9ccb7ef763acc0780 submitted by /u/Fluid-Age-9266 [link] [comments]  ( 9 min )
    [D]Finetune t5 for classification but not seeing loss reduction
    I am wondering if any one runs into this before, i have finetuned flan-t5-xl for classification task by generating one token from decoder. The finetune process looks OK. I want to convert this into t5 encoder with a head to save memory. I am using huggingface T5ForSequenceClassification. However i am seeing loss not actually decrease but bounce around certain float value. What could be wrong? I have tried a few learning rates and other hyperparameters tuning. submitted by /u/Chen806 [link] [comments]  ( 9 min )
    [R] Microsoft Researchers Propose DIT Morality Test for LLMs To Quantify AI Moral Reasoning Abilities
    Researchers from Microsoft have just proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc. The DIT presents moral dilemmas and has subjects rate and rank the importance of various ethical considerations related to the dilemma. It allows quantifying the sophistication of moral thinking through a P-score. In this new paper, the researchers tested prominent LLMs with adapted DIT prompts containing AI-relevant moral scenarios. Key findings: Large models like GPT-3 failed to comprehend prompts and scored near random baseline in moral reasoning. ChatGPT, Text-davinci-003 and GPT-4 showed coherent moral reasoning with above-random P-scores. Surprisingly, the smaller 70B LlamaChat model outscored larger models in its P-score, demonstrating advanced ethics understanding is possible without massive parameters. The models operated mostly at intermediate conventional levels as per Kohlberg's moral development theory. No model exhibited highly mature moral reasoning. I think this is an interesting framework to evaluate and improve LLMs' moral intelligence before deploying them into sensitive real-world environments - to the extent that a model can be said to possess moral intelligence (or, seem to possess it?). Here's a link to my full summary with a lot more background on Kohlberg's model (had to read up on it since I didn't study psych). Full paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
  • Open

    Any good AI newsletters? I'm tired
    Any good AI (low-hype) newsletters/blogs? That's ideally sent <= 4 times a month? I'm tired of the amount of AI news I have to go through daily just to keep up. submitted by /u/onteri [link] [comments]  ( 9 min )
    AI is taking jobs away from Chinese streamers and online retailers
    AI-generated deepfake clones of Chinese livestream influencers are becoming popular on e-commerce platforms. These clones can work 24/7 and help brands sell their products without the need for human streamers. Chinese startups and tech companies are offering the service of creating these deepfake avatars for a cost of around $1,000. The technology has evolved over the years, with the need for training videos decreasing from 30 minutes to just one minute. The AI clones can mimic the movements and speech of human streamers, making them an affordable and efficient alternative for smaller brands. Source : https://www.technologyreview.com/2023/09/19/1079832/chinese-ecommerce-deepfakes-livestream-influencers-ai/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Using language models for code generation works better when limited to a specific domain
    Automatic code generation has always been an integral part of programming: compilers, synthesis tools, convertors, etc. are examples of classic code generators. Now, with such powerful LLMs at hand, it is only natural to try to find new ways to generate codes. The question is: are LLMs the right tool for code generation? There are two sides to code generation: (1) understanding the intent (a.k.a. capturing the spec) (2) writing the code. LLMs are great for (1), but not so good for (2). This is an example of using LLM for general-domain code generation: https://github.com/RoboCoachTechnologies/GPT-Synthesizer You can see that the main focus here is to properly capture the spec, and that's where LLMs shine. LLMs solution for a general-domain code generation may not be complete or optimized. It is always easier to break the problem and solve code generation in a specific domain. Here you can see how much better and cleaner the output of code generation can be when it is limited to a specific domain (robotics domain, ROS in particular, in this case): https://github.com/RoboCoachTechnologies/ROScribe What are your thoughts on using LLMs for code generation? submitted by /u/RoboCoachTech [link] [comments]  ( 9 min )
    How to stop AI deepfakes from sinking society — and science
    submitted by /u/waozen [link] [comments]  ( 9 min )
    Even the CIA is developing an AI chatbot
    The CIA is developing an AI chatbot similar to ChatGPT to help US intelligence agencies sift through large amounts of information. The program will train on publicly available data and provide sources for agents to confirm their validity. The tool will allow agents to look up information, ask follow-up questions, and summarize daunting masses of data. The exact nature of what constitutes 'public data' could spark privacy issues. The tool will be distributed to the 18-agency US intelligence community, but not to lawmakers or the public. Source : https://www.engadget.com/even-the-cia-is-developing-an-ai-chatbot-192358767.html submitted by /u/NuseAI [link] [comments]  ( 9 min )
    UNC Researchers Present VideoDirectorGPT: Using AI to Generate Multi-Scene Videos from Text
    Generating coherent videos spanning multiple scenes from text descriptions poses unique challenges for AI. While recent progress enables creating short clips, smoothly transitioning across diverse events and maintaining continuity remains difficult. A new paper from UNC Chapel Hill proposes VIDEODIRECTORGPT, a two-stage framework attempting to address multi-scene video generation: Here are my highlights from the paper: Two-stage approach: first a language model generates detailed "video plan", then a video generation module renders scenes based on the plan Video plan contains multi-scene descriptions, entities/layouts, backgrounds, consistency groupings - guides downstream video generation Video generation module called Layout2Vid trained on images, adds spatial layout control and cross-scene consistency to existing text-to-video model Experiments show improved object layout/control in single-scene videos vs baselines Multi-scene videos display higher object consistency across scenes compared to baselines Competitive open-domain video generation performance maintained The key innovation seems to be using a large language model to plot detailed video plans to guide overall video generation. And the video generator Layout2Vid adds better spatial and temporal control through some clever tweaks. The separation of these tasks seems to matter. You can read my full summary here. There's a link to the repo there too. Paper link is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Cyberpunk Multiverse
    I created this cyberpunk inspired short using Midjourney to create the pictures, RunwayML to animate them, and then edit them together using CapCut on iOS. I know the animation is still in early stages, but what do you think? Do you think we could have full length movies in a couple years? submitted by /u/Exitium_Maximus [link] [comments]  ( 9 min )
    A Simple Checklist for Self-Evaluating Prompt Quality
    How do you evaluate the quality of your prompt outputs? Here's a handy checklist. Let's have a look! You can also join r/PromptWizards to find more tutorials and prompts! Part 1: Understanding AI's Understanding You've presented a prompt to your AI, the next questions are: Has the AI accurately grasped the context? If not, how can I make sure the LLM steers my context better, should I be more direct and clear in my prompt? Can I be less negative (shows to perform less) and be more guiding to the LLM? Do the responses directly address the question or topic? Was my query and task/instruction clearly detailed in enough depth that the LLM understood what I expect? Are there any contradictions between different responses to the same prompt? If I run my prompt multiple times, i…  ( 10 min )
    OpenAI’s GPT-4 With Vision Still Has Flaws, Reveals Paper
    OpenAI's much-touted model GPT-4, lauded for its multimodal abilities, including advanced image recognition, still has significant flaws. These glitches range from inventing facts to misinterpreting chemicals' images and hate symbols, according to a new paper from OpenAI. To stay ahead of AI developments, look here first. https://preview.redd.it/seg5x4zn3uqb1.png?width=1108&format=png&auto=webp&s=635a6c58cf6255f62d8eae3077678864e5b0e248 Unintended GPT-4V behaviors GPT-4V has a tendency to hallucinate or invent facts with unwarranted confidence. The model struggles to make correct inferences, sometimes creating fictional terms by wrongly combining text strings. It misinterprets certain symbols of hate and can give incorrect answers in the context of medical imaging. OpenAI’s mitigation strategies OpenAI has implemented various safeguards to prevent GPT-4V's misuse, such as breaking CAPTCHAs or using images to infer personal details. The company insisted that GPT-4V is not to be used for identifying dangerous chemicals from image structures. OpenAI acknowledged it has a long way to go in refining the model and is working on it. Discrimination and bias When OpenAI’s production safeguards are disabled, GPT-4V displays bias against certain sexes and body types. The paper reported offensive responses related to body positivity when prompted by an image of a woman in a bathing suit. (source) P.S. If you like this kind of analysis, I write a free newsletter that dissects the most impactful AI news and research. 1000s of professionals from Google, Meta, and OpenAI read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    New Bing browser, same Bing results. Score was 10-27 btw.
    submitted by /u/degrudv [link] [comments]  ( 9 min )
    Are language Models being nerfed?
    In using Ai and asking it to do simple tasks like "explain this in more simple terms" or asking it to make flashcards for me in a certain format, I am really convinced that language models, (bard and openai specifically) are being nerfed. They cannot understand simple instructions as well anymore. I had a paragraph of information for one of my classes that I wanted it to make more straightforward for me before I actually went to class the next day. I spent like 30 minutes trying to get it to do that and eventually just ended up giving up. Why dont language models feel as sharp as they did say a year ago? I wish I had more examples to share. Am I the only one who's noticed this? submitted by /u/Bojof12 [link] [comments]  ( 9 min )
    Looking For The Best AI Art Generator? Look No Further! (Definitive Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Looking to change my own voice for audio production
    Hi all – I’m new to this sub-Reddit, so hopefully I’m in the right place. I am working on an audio production that will span multiple episodes and hopefully multiple seasons. It will require many characters, ranging in gender, age, ethnicity, etc. I am a decent voice actor and can do many of the roles myself, but some of them I cannot fake using my voice alone. My budget is very limited, so I was hoping to find some type of software I can change my voice for the production. This can be during the recording process, or after recording… As long as it gets the job done, and makes me sound like someone else entirely. Does anybody know of a software that can achieve this? Most of the software I found either specifically designed to change users voice on the spot and is aimed at gamers changing your voice for live streams or in-game chats. I’m also on a Mac, which I know will be limiting. I’m having a hard time finding something I can use. Any suggestions will be helpful. Thank you! EDIT: To clarify, I don’t want to just change my voice to sound different in general. I want to specifically sound like a woman, an elder man, or someone of a different ethnic background. Those are just a few examples. submitted by /u/nopetoocreepy [link] [comments]  ( 9 min )
    I asked AI to create a religion and this is what it created (its weird)
    submitted by /u/GABIBBOPAZZOCINESE [link] [comments]  ( 9 min )
    My uneducated opinion on where we are going with AI - video essay.
    submitted by /u/rattuspuer [link] [comments]  ( 9 min )
    This Article Was Written Half By A Human... And Half By AI.
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    CGPT-4, describe what the US would look like today had the insurrection suceeded.
    Predicting alternative historical scenarios is always fraught with complexity and uncertainty, especially concerning highly politically charged topics. However, let's entertain the hypothetical situation where the insurrection following the 2020 U.S. presidential election had succeeded, with the understanding that this is speculative reasoning. In this alternate reality, the immediate consequences would be a constitutional crisis of unprecedented proportions. Faith in democratic institutions would be severely eroded, both domestically and internationally. Trump's retaining power in this manner would spark widespread protests, perhaps more intense and larger than those seen in the summer of 2020. The unrest would likely lead to a governmental response that could be more authoritarian, poss…  ( 10 min )
    Everest.
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 9 min )
    AI art generator
    Hey, so I'm hoping to get a bit of help with finding an art generator to play around with, my only concern is the giving my entire personal information away, are there any apps for Android whereby the tos aren't crazy invasive by any chance? submitted by /u/Fluffy_Discount_9692 [link] [comments]  ( 9 min )
    Deepfake election risks trigger EU call for more generative AI safeguards
    The European Union is urging the implementation of more safeguards against the risks posed by generative AI tools to free and fair debate in democratic societies, especially during elections. The EU's values and transparency commissioner has highlighted the potential threat of AI-generated disinformation to elections and called for platforms to be vigilant and provide efficient safeguards. Mainstream platforms have made initial efforts to address the risks by implementing safeguards to inform users about the synthetic origin of content posted online. The EU commissioner is meeting with representatives from OpenAI to discuss the issue. An incoming pan-EU AI regulation, the EU AI Act, is expected to make user disclosures a legal requirement for generative AI technologies. The EU's voluntary anti-disinformation Code has 44 signatories, including major social media and search platforms, as well as entities from the ad industry and civil society organizations. Google, one of the signatories, has published a report discussing the social impacts of AI and its commitment to developing technology responsibly. Google Search has published guidance on AI-generated content and plans to integrate new innovations in watermarking, metadata, and other techniques into its generative models. The EU's Code of Practice on Disinformation is seen as a stop-gap measure until the EU AI Act is adopted and mandatory deepfake disclosures are enforced. Source : https://techcrunch.com/2023/09/26/generative-ai-disinformation-risks/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/26/2023
    Chinese media reported that BIDU’s Baidu AI Cloud has released ACE 3.0, an intelligent traffic solution comprehensively restructured using a foundation model. ACE means Autonomous Driving, Connected Road, and Efficient Mobility respectively.[1] BCG consultants solving business problems with OpenAI’s GPT-4 performed 23% worse than those without it, new study finds.[2] CIA Builds Its Own Artificial Intelligence Tool in Rivalry With China.[3] Facebook parent is developing bots with personalities, including a ‘sassmaster general’ robot that answers questions.[4] Sources: [1] http://www.aastocks.com/en/stocks/news/aafn-con/NOW.1296238/popular-news/AAFN [2] https://finance.yahoo.com/news/bcg-consultants-solving-business-problems-081532840.html [3] https://www.bloomberg.com/news/articles/2023-09-26/cia-builds-its-own-artificial-intelligence-tool-in-rivalry-with-china#xj4y7vzkg [4] https://www.wsj.com/tech/ai/meta-ai-chatbot-younger-users-dab6cb32 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How do I turn images into landscapes?
    I was wonderhow someone made the destroyed building look like a cat. Anyone know how to do this? submitted by /u/Agitated-Court-2871 [link] [comments]  ( 9 min )
    Getting an A6000. What interesting things can I do with it?
    As title, I’ll be getting my hands on a couple of decent GPUs, including an old A6000, and am excited for everything its 48GB of VRAM unlocks. What’s something interesting I should do with it? A few things off the top of my head: See what crazy things stable diffusion generates at an insane resolution (how high of a resolution would 48GB allow?) Train good Dreambooth models (or what newer methods are there for style and object training?) Run and compare various open-source LLMs (should be able to run 70b models? Generate something of decent length with MusicGen Gaussian Splatting Distribute voice recognition, TTS, audio2face, LLM, and rendering across 2 or 3 machines to create a realistic virtual human (suggestions for excellent TTS would be appreciated) What other interesting models are out there to experiment with? submitted by /u/DsDman [link] [comments]  ( 9 min )
    Microsoft Researchers Propose AI Morality Test for LLMs in New Study
    Researchers from Microsoft have just proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc. The DIT presents moral dilemmas and has subjects rate and rank the importance of various ethical considerations related to the dilemma. It allows quantifying the sophistication of moral thinking through a P-score. In this new paper, the researchers tested prominent LLMs with adapted DIT prompts containing AI-relevant moral scenarios. Key findings: Large models like GPT-3 failed to comprehend prompts and scored near random baseline in moral reasoning. ChatGPT, Text-davinci-003 and GPT-4 showed coherent moral reasoning with above-random P-scores. Surprisingly, the smaller 70B LlamaChat model outscored larger models in its P-score, demonstrating advanced ethics understanding is possible without massive parameters. The models operated mostly at intermediate conventional levels as per Kohlberg's moral development theory. No model exhibited highly mature moral reasoning. I think this is an interesting framework to evaluate and improve LLMs' moral intelligence before deploying them into sensitive real-world environments - to the extent that a model can be said to possess moral intelligence (or, seem to possess it?). Here's a link to my full summary with a lot more background on Kohlberg's model (had to read up on it since I didn't study psych). Full paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
  • Open

    Online Training from Demonstrations
    I would like to embark on online training for an F1TENTH racing car, starting from scratch and leveraging demonstration data. Currently, it appears that DDPGfD is a promising approach. Does anyone have any research papers they can recommend or suggestions on how to get started? submitted by /u/anointedninja [link] [comments]  ( 9 min )
    "What If the Robots Were Very Nice While They Took Over the World?" (reflections on CICERO & _Diplomacy_)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Advice on getting started with a career in reinforcement learning
    Reinforcement learning has grabbed my interest pretty firmly and been my focus for the 3 months or so. I spend most of my time working in python , rust, and now mojo. Not an expert yet but my coding skills are improving. I have no degree and have taught myself most of what I know. That part is why Im looking for advice from you all. Practically every job post Ive seen has college requirements. Is it unlikely to get hired without a degree? Additional information: I'm currently working on projects for github but those arent quite done. My main interest is related to RL in game design. Applications of distributional RL in action dense environments and VR. Currently using Godot engine the most and have used pytorch, openai gym, and tensorflow (to a lesser degree). The abstract concepts of neural networks comes easy to me and Ive been following basic neurology as well. submitted by /u/SchrodingersCog [link] [comments]  ( 9 min )
    How to modify DQN to not overfit for action that concludes episode
    Edit: I may be jumping the gun here but I think I figured it out (looks good so far). I give the episode reward for every action EXCEPT the "end early" action, now I will need to give some boost for shorter episodes to achieve the desired effect :) I feel like I'm experiencing déjà vu, posing another DQN-related question. But, here's my issue: I've set up an environment where an agent can interact for 40 steps or choose to end the interaction early with a specific action. The catch is that the reward is only given at the end of the episode, which seems to be leading the agent to strongly favor the "end early" action. Despite all other steps getting a reward of 0, I assumed the long-term reward estimate, V(s_{t+1}), would mitigate this, but the agent still heavily gravitates towards ending the episode early. Attempted Solutions: Distributed the end-of-episode reward across all prior actions taken by the agent. Considering: Replacing the "end early" action with a "do nothing" action, allowing the episode to always play out in full. However, this seems like it could introduce additional computational costs and noise. Has anyone encountered a similar problem? I'd appreciate any advice or recommendations. submitted by /u/Vae94 [link] [comments]  ( 9 min )
    DeepMind built an excellent Stratego bot. Can I get an ELI5 of the underlying technologies, DeepNash and R-NaD?
    I learned about DeepNash and R-NaD yesterday. I read the Deepmind article, Science paper, and the source code of rnad.py. But I don't think I understand it! Part of this is that they didn't define all the terms and Greek they use in the paper, and part of it is that I don't have academic ML experience. Below is my attempt to summarize the paper in non-academic terms. I'm trying to show that I did my homework, and also I'm trying to invoke Godwin's Law in the hopes that someone will come along and correct me. Here goes: Naïve reinforcement learning doesn't work with simultaneous choice games such as matching pennies or Rock-Paper-Scissors. In naïve RL, If I choose Rock as my move, my opponent chooses Scissors as their move, and I see that I won, that will reinforce a belief that Rock is a "good" move and Scissors is a "bad" move. But this isn't true! This means that, during selfplay, a naïve RL agent will just cycle through strategies, as the timestep-(τ_n) agent learns how to beat the timestep-(τ_n-1) agent. The agent will never learn that RPS is a game about staying unpredictable! R-NaD fixes this by adjusting the reward function. I think "regularizing" is ML-academic speak for "adjusting". It adjusts the reward function in such a way that the agent will converge at a Nash equilibrium strategy. The paper's equation (1) describes how the regularization works. They didn't explain all the terms, though. I still don't know what a_i represents. But I think it corresponds to parts of the code like this line and this line. The key is that we're merging policies from multiple epochs and making sure that the current agent's move probabilities fare well against not only itself, but also against its previous two generations. They've proven that three generations is all you need to eventually converge to a Nash equilibrium. So... that's my understanding. Does anyone with actual ML experience want to weigh in? submitted by /u/lord_braleigh [link] [comments]  ( 10 min )
  • Open

    Re-imagining the opera of the future
    The iconic sci-fi opera “VALIS,” first composed by Professor Tod Machover in 1987, reboots at MIT for a new generation.  ( 11 min )
    From physics to generative AI: An AI model for advanced pattern generation
    Inspired by physics, a new generative model PFGM++ outperforms diffusion models in image generation.  ( 10 min )
  • Open

    A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction
    The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. The design and deployment processes of projects involve many types of Requests for Information (RFIs) about engineering requirements regarding Amazon and project-specific guidelines. These requests range from simple retrieval of baseline […]  ( 13 min )
    MDaudit uses AI to improve revenue outcomes for healthcare customers
    MDaudit provides a cloud-based billing compliance and revenue integrity software as a service (SaaS) platform to more than 70,000 healthcare providers and 1,500 healthcare facilities, ensuring healthcare customers maintain regulatory compliance and retain revenue. Working with the top 60+ US healthcare networks, MDaudit needs to be able to scale its artificial intelligence (AI) capabilities to […]  ( 5 min )
  • Open

    DENZA Unwraps Smart Driving Options for N7 Model Lineup, Powered by NVIDIA DRIVE Orin
    DENZA, the luxury electric-vehicle brand and joint venture between BYD and Mercedes-Benz, is debuting new intelligent driving features for its entire N7 model lineup, powered by the NVIDIA DRIVE Orin system-on-a-chip (SoC). The N7 series was introduced earlier this year as a family of spacious five-seater SUVs for commuters looking to sport a deluxe EV Read article >  ( 5 min )
    The Fastest Path: Healthcare Startup Uses AI to Analyze Cancer Cells in the Operating Room
    Medical-device company Invenio Imaging is developing technology that enables surgeons to evaluate tissue biopsies in the operating room, immediately after samples are collected — providing in just three minutes AI-accelerated insights that would otherwise take weeks to obtain from a pathology lab. In a surgical biopsy, a medical professional removes samples of cells or tissue Read article >  ( 6 min )
    NVIDIA Works With NTT DOCOMO to Launch World’s First GPU-Accelerated 5G Network
    As generative AI sweeps across corporate boardrooms around the world, global telecommunications companies are exploring how to cost-effectively deliver many of these new AI applications to the edge over 5G and upcoming 6G networks. Telcos plan to deploy over 17 million 5G microcells and towers worldwide by 2025. Building, managing and optimizing this new infrastructure Read article >  ( 6 min )
  • Open

    Research Focus: Week of September 25, 2023
    Chunked prefills & decode-maximal batching boost LLM inference; DragNUWA combines text, image, and trajectory for fine-grained video content control; reconstructing images from human brain signals; structural inequalities in creator-audience relationships. The post Research Focus: Week of September 25, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    Circular coordinate art
    About three years ago I ran across a strange coordinate system in which familiar functions lead to interesting plots. The system is called “circular coordinates” but it is not polar coordinates. This morning I was playing around with this again. Here’s a plot of f(x) = x. And here’s a plot of f(x) = cos(8x). […] Circular coordinate art first appeared on John D. Cook.  ( 5 min )

  • Open

    [D] Implementation of ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization
    I found the paper “ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization” published in march and I was looking for information about the cost of training such a system. Have someone tried ? Is there some weights in the nature already trained for the instructor model ? I have found the GitHub associated with the paper but it obviously only contain the code for training but no information about approximate token used or anything like that. submitted by /u/Agreeable-Committee6 [link] [comments]  ( 9 min )
    [P] Interact with an OWL-ViT Object Detection Model
    We noticed a lot of people wanting to deploy computer vision models, so we built an interactive demo of OWL-ViT to show how it might be used by an end user when integrated into a product. OWL-ViT is a new object detection model from the team at Google Research. It allows you to identify an object in one image (the “query image”) and then find that same object in any number of target images. Here is the link to interact with an OWL-ViT model! submitted by /u/modelbit [link] [comments]  ( 9 min )
    Question about dataset [D]
    hey everyone, novice at ML and trying to do a project on my own. I am trying to predict the rainfall amount in inches for a given day. I’ve decided to make it a classification problem and predict the zone of rainfall as in 0-0.5 in inches or 1-1.5 inches. My data set has ~40,000 samples however i have noticed that 24,000 of them have 0.0 as the amount of rainfall. And a high percentage of the rest are very low like below 0.5 inch. I’m wondering if there’s still a way to create the type of model I had originally intended or not. Is there a way to reduce the size of my data set , specifically the amount of low values without losing important feature information? Thank you and any help is appreciated :) submitted by /u/RepeatResponsible499 [link] [comments]  ( 9 min )
    [D] Asus ROG Zephyrus vs Macbook Pro for ML (PhD Student)
    Hi all, I understand it all comes down to personal preference and that it is an old topic, but a bit advice would be welcome. My current workload consists of analyzing large medical records, medical images (upcoming work) with mainly PyTorch. Now I have direct and remote access to my personal lab pc which has configuration: core i9-9900K, 32 GB Ram, GTX 2080Ti 12 GB, Windows 11. Now I am planning to buy a laptop that would help with coursework, research paper reading and remote access to my lab PC. It should last at least 4/5 years (My current 5-year-old MSI laptop's hinge broke). I have the following laptops in mind with a budget of around $2000 14-inch Macbook Pro with 16 GB RAM and M2 PRO = $1999 ASUS ROG Zephyrus 15.6" WQHD 165Hz Gaming Laptop, AMD Ryzen 9 6900HS,16GB DDR5 4800Mhz RAM, 1TB SSD PCIe 4.0 Storage, NVIDIA GeForce RTX 3060 = $1400 submitted by /u/Furiousguy79 [link] [comments]  ( 9 min )
    Is Rust a thing in ML? [D]
    I've seeing some people saying thar python is for training models and rust is for deploying them. Is it a widespread practice or it's just a localized need for companie with "performance sensitive" models? submitted by /u/horace_desplein [link] [comments]  ( 9 min )
    [D] Announcing Boomerang - Vectara's new embedding model
    Happy to share Vectara's new state-of-the-art embedding model, called Boomerang. Embedding models were so far not too much in the spotlight relative to chat models like ChatGPT, but for Retrieval-augmented-generation applications, getting the best embedding model matters a lot. would love to hear what has been the experience of others in this respect - what embedding models have worked best so far with RAG? Blog post: https://vectara.com/introducing-boomerang-vectaras-new-and-improved-retrieval-model/ Hackernews: https://news.ycombinator.com/item?id=37661359 submitted by /u/ofermend [link] [comments]  ( 9 min )
    [R] Automated Quality Assurance for Object Detection Datasets
    Would you deploy a self-driving car model that was trained on images for which data annotators accidentally forgot to highlight some pedestrians? Errors in object detection examples found via cleanlab. Annotators of real-world object detection datasets often make such errors and many other mistakes. To avoid training models on erroneous data and save QA teams significant time, you can now use automated algorithms invented by our scientists. Our newest paper introduces Cleanlab Object Detection: a novel algorithm to assess label quality in any object detection dataset and catch errors (named ObjectLab for short). Extensive benchmarks show Cleanlab Object Detection identifies mislabeled images with better precision/recall than other approaches. When applied to the famous COCO dataset, Cleanlab Object Detection automatically discovers hundreds of mislabeled images, including errors where annotators mistakenly: overlooked an object that should’ve had a bounding box, sloppily drew a box in a poor location, or chose the wrong class label for an annotated object. We’ve open-sourced one line of code to find errors in any object detection dataset via Cleanlab Object Detection, which can utilize any existing object detection model you’ve trained. For those interested, you can check out the 5-minute tutorial to get started and the blog to read the details. submitted by /u/jonas__m [link] [comments]  ( 9 min )
    [R] 🤖🎸 Need directions to embed and query structured table data for a music recommendation system
    Hi there community, I hope everyone is doing well ::] I’m exploring ada-002 embedding model for building a recommendation system (along some other similarity search things like generating playlists), so naturally a lot of questions started to pop. But before goign deeper, let me explain what I am building and how the data is structured: Imagine a music app with song recommendations based on all the users history and musical metadata. Currently I have a table with a couple data on it just for tests - the users, the artists and the songs. Each of these columns have their own rows, for example song have genres, danceability, number of likes, etc. I am now implementing two more columns for history logs - a “history” (that will be related with users and songs) and a “session” (wich is a coll…  ( 12 min )
    [P] Where can I find Pre-Annotated images dataset
    I am trying to do an Object Detection project, Does anyone know where I can find Pre-Annotated image dataset submitted by /u/Nomadic-Foe-011 [link] [comments]  ( 9 min )
    [R][P][D] Scene Encoder like ViT L/14 from CLIP but for 3D Scenes
    I'm working on my thesis and want to perform 3D scene understanding and VQA. My scenes would be textured meshes (or pointclouds). My goal is not only to know the objects present in the scene but also the spatial relationships between them, like chair is in front of the couch, bottle is on the table etc. I want to know if there is a 3D scene encoder like the 2D image encoder ViT L/14 from CLIP. My search hasn't resulted much yet in this direction, but I have come across papers that render a 3D scene in multiple angles and then use 2D scene encoders on them. So I'd like to ask the community: Are there 3D scene encoders like CLIP ViT If not, is there's any other way that I can approach this problem. submitted by /u/Bluebird705 [link] [comments]  ( 9 min )
    [Research] Exciting New Paper on StyleGAN Domain Adaptation: StyleDomain - ICCV 2023
    Hey, fellow machine learning enthusiasts! AIRI researchers are thrilled to share some exciting news with you all. Our paper, "StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation", has been accepted to ICCV 2023! 🥳 Abstract: Domain adaptation of GANs is a problem of fine-tuning GAN models pretrained on a large dataset (e.g., StyleGAN) to a specific domain with few samples (e.g., painting faces, sketches, etc.). While there are many methods that tackle this problem in different ways, there are still many important questions that remain unanswered. In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. We perform a detailed exploration of the most i…  ( 10 min )
    [D] What are some good AI tools to help you in your own 2D digital art. Softwares or apps that help you improve and speed up your drawing/colouring process.
    Title pretty much says it all. It would be really cool if we have more AI tools that don't just straight up generate an image but help artists in their own art process. submitted by /u/salehxoxo [link] [comments]  ( 9 min )
    [D] How did you succeed in a new role? What lessons did you take from your previous role?
    When switching to a new role what did you do to ensure that you succeed? What lessons did you learn from your previous job that you took into your new job? For example Im in the process of switching jobs and one of the things I’ve learnt is that when delivering results (during fire drills) the way I write my code is focused on simply getting the results out vs being organized, efficient and scalable. While I get from point A to point B the way I get from point A to point B is not the most efficient. I think something I can do is take a step back and take a top down approach to problem solving when I enter my new role. submitted by /u/Terrible-Hamster-342 [link] [comments]  ( 9 min )
    [N] NEXT WEEK ICCV - Feel at ICCV as if you were at ICCV!
    Next week will take place the International Conference on Computer Vision ICCV2023 in Paris. If you are not going, stay in touch by subscribing to the ICCV Daily magazine. It's free: https://www.rsipvision.com/feel-iccv-iccv/ Full daily previews and reports of selected ICCV papers and events. https://preview.redd.it/yxmf2ksomlqb1.jpg?width=794&format=pjpg&auto=webp&s=7063c770e7a02d0ca7bba6f41ecc36438aa86256 submitted by /u/Gletta [link] [comments]  ( 9 min )
  • Open

    When there is only one group of a given size
    Today’s date, US style, is 9/26/2023, and there is only one group, up to isomorphism, of size 9262023. You could verify this in Mathematica with the command FiniteGroupCount[9262023] which returns 1. For a given n, when is there only one group of size n? There are two requirements. First, n has to be the product […] When there is only one group of a given size first appeared on John D. Cook.  ( 5 min )
    Analogy between prime numbers and simple groups
    Simple groups are the building blocks of groups similar to the way prime numbers are the building blocks of integers. This post will unpack this analogy in two ways: How do simple groups compare to prime numbers? How does the composition of simple groups compare to the composition of prime numbers? The former analogy is […] Analogy between prime numbers and simple groups first appeared on John D. Cook.  ( 6 min )
  • Open

    Any alternative tools to Otter.ai?
    Hey, long story short, I've used Otter.ai for recording, and transcribing my ideas on the fly and it's really, really good! The only thing it is missing for my use case is to be able to edit the transcripts (remove some parts for example) and then have that piece be removed from the audio file as well, so you can see how long is the actual useful part that.. I also need it to have an app, since the whole point of doing this is catching ideas that just rush to my head. Apparently DeScribe has this option, but I haven't tried it and it doesn't work on mobile anyways. I know it's probably not available, but does anyone know any services similar to this? I don't need an AI bot, don't care about integration with other apps, and will not use it for meetings. TLDR: I just want an app to be able to record, and then transcribe my ideas, and then allow me to edit/fine-tune the transcript and have the audio file be edited in the same way as well.. Thanks! submitted by /u/reza2kn [link] [comments]  ( 9 min )
    Is there an AI I can use where I can upload vocals of a song I've wrote and have a backing track made for it?
    I have lots of lyrics I've written with the melody but I don't know how to play an instrument. submitted by /u/82brighteyes [link] [comments]  ( 9 min )
    Generate Famous Person with a Random T-Shirt
    Hello all, Is it possible to use a tool or site for free that generates any random historical figure with a shirt of my choosing? Thank you all submitted by /u/JYanezez [link] [comments]  ( 9 min )
    Adversarial AI Attacks: Hidden Threats
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
    Prompt Chaining: Elevating Task Automation with LLMs
    👋 Hey Reddit! Let's dive into the realm of Prompt Chaining. If you want to check out more prompt chain examples, then we invite you to join our community at r/PromptWizards. 🔗 Prompt Chaining: More Than Meets the Eye In the world of AI interaction, Q&A sessions with ChatGPT are thrilling. They offer fascinating glimpses into AI's creative potential and can even transform into a productive brainstorming session. But what happens when we need reliable, consistent outputs, especially for applied use cases? Enter Prompt Chaining. Prompt Chaining is a technique that breaks down complex tasks into manageable sub-steps and induces a chain reaction of responses. It allows us to use the output of one prompt as the input for the next, thereby creating a coherent, consistent, and reliable chai…  ( 10 min )
    AI for realistic images generated from pictures
    I would like to make realistic stuff using screenshots I took in video games.I know there are plenty of text to image AI tools, but are you guys familiar with image to image ones? submitted by /u/LauraLuna99 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Founder and CEO Jensen Huang Returns to Denny’s Where NVIDIA Launched a Trillion-Dollar Vision
    Talk about a Grand Slam. Denny’s CEO Kelli Valade was joined Tuesday by NVIDIA CEO Jensen Huang to unveil a plaque at the Silicon Valley Denny’s where NVIDIA’s founders hatched their idea for a chip that would enable realistic 3D graphics on personal computers. “This is a place where we fuel ideas. Your story is Read article >  ( 6 min )
    AI Power Players: GeForce and NVIDIA RTX GPUs Supercharge Creativity, Gaming, Development, Productivity and More
    From gaming to creating to everyday productivity, NVIDIA RTX graphics cards feature specialized Tensor Cores that deliver cutting-edge performance and transformative capabilities for AI.  ( 7 min )
  • Open

    DSC Weekly 26 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 26 September 2023 appeared first on Data Science Central.  ( 20 min )
    Doing graph + tabular analytics directly on modern data lakes
    A podcast with Weimo Liu and Sam Magnus of PuppyGraph Open source Apache Iceberg, Hudi and Delta Lake have made it possible to dispense with the complexities and duplication of data warehousing. Instead of requiring time-consuming extract, transform and load (ETL) procedures, these large table formats make it simple to tap S3 and other repositories… Read More »Doing graph + tabular analytics directly on modern data lakes The post Doing graph + tabular analytics directly on modern data lakes appeared first on Data Science Central.  ( 20 min )
  • Open

    My agent does not learn the most obvious task. Please help me figure out why!
    I am very puzzled as to the results I have observed today, after running an extremely simple environment and receiving really bad results. I am probably doing something wrong, and would like to ask for your wisdom to assist me in figuring out what I am doing wrong. I will not describe the entire task since that is a long story; I will just say that I started by doing something complex (a multi-objective reward), and when it failed I decided to try something extremely simple ("because it will surely work and I can proceed from there..."). To my surprise, the agent was not able to perform even that very simple task. That simple task is the following: at each step, choose a subset of items. Each item has a value, and the goal is to maximize the overall value (that is, at the end of the traje…  ( 10 min )
    Learning to code?
    I've just started diving into the world of coding over the past week, and I've been using various tools like YouTube videos, Visual Basic, GPT-3.5, Bard, and Bing to help me learn the ropes. It's been a bit of a journey, and I've definitely picked up some understanding along the way, especially when it comes to libraries. But, you know, there's only so much you can really learn from AI models like GPT or other chatbots. Most of my progress has involved me taking bits and pieces of code I found here and there and trying to piece them together, even if it sometimes felt like making a digital spaghetti dish! One project I tackled involved using Stable_baselines3 PPO with ADAM optimization to play the classic game Flappy Bird. It was a bit of a wild ride, taking about 6-7 hours of my time, an…  ( 10 min )
  • Open

    Build and deploy ML inference applications from scratch using Amazon SageMaker
    As machine learning (ML) goes mainstream and gains wider adoption, ML-powered inference applications are becoming increasingly common to solve a range of complex business problems. The solution to these complex business problems often requires using multiple ML models and steps. This post shows you how to build and host an ML application with custom containers […]  ( 13 min )
  • Open

    Google Research embarks on effort to map a mouse brain
    Posted by Michał Januszewski, Research Scientist, Google Research The human brain is perhaps the most computationally complex machine in existence, consisting of networks of billions of cells. Researchers currently don’t understand the full picture of how glitches in its network machinery contribute to mental illnesses and other diseases, such as dementia. However, the emerging connectomics field, which aims to precisely map the connections between every cell in the brain, could help solve that problem. While maps have only been created for simpler organisms, technological advances for mapping even larger brains can enable us to understand how the human brain works, and how to treat brain diseases. Today, we're excited to announce that the Connectomics team at Google Research and …  ( 92 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-10-26T00:42:18.408Z osmosfeed 1.15.1